CN115396738A

CN115396738A - Video playing method, device, equipment and storage medium

Info

Publication number: CN115396738A
Application number: CN202110571850.7A
Authority: CN
Inventors: 余自强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2022-11-25

Abstract

The embodiment of the application provides a video playing method, a video playing device, video playing equipment and a storage medium, which relate to the technical field of computers, and the method comprises the following steps: the method comprises the steps of firstly obtaining a first playing time based on a target adjusting operation triggered by a user according to the playing progress of a target video, and then obtaining and displaying a text information set corresponding to the first playing time, so that the user can intuitively obtain video content of the target video near the first playing time from the text information set. And then responding to a selection operation triggered by the target text information in each text information to obtain a second playing time corresponding to the target text information in the target video, starting to play the target video from the second playing time, finely positioning to the second playing time in the target video required by the user, and starting to play the target video from the second playing time, so that the efficiency and the accuracy of adjusting the video watching progress are improved.

Description

Video playing method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a video playing method, a video playing device, video playing equipment and a storage medium.

Background

With the development of video technology, people have higher and higher requirements on viewing experience. At present, when a user watches a video, the watching progress of the video can be adjusted according to needs, and only interested video clips are watched. For viewing progress adjustments, the relevant video platform provides a video progress bar. The user can slide the video progress bar to any time point in the video to realize the video progress adjustment.

However, the method of sliding the video progress bar is difficult to accurately locate the video clip desired by the user, and the user often needs to slide the video progress bar multiple times or naturally transit to the desired video clip by viewing the undesired video clip, thereby resulting in inefficient adjustment of the video viewing progress.

Disclosure of Invention

The embodiment of the application provides a video playing method, a video playing device, video playing equipment and a storage medium, which are used for improving the efficiency and accuracy of adjusting the video watching progress.

In one aspect, an embodiment of the present application provides a video playing method, where the method includes:

responding to a target adjustment operation triggered by the playing progress of the target video, and determining a first playing time of the adjusted target video;

acquiring a text information set corresponding to the first playing time in the target video, wherein the text information set comprises at least one text information;

displaying the at least one text message in a video playing interface;

and responding to a selection operation triggered by the target text information in the at least one text information, obtaining a second playing time corresponding to the target text information in the target video, and starting to play the target video from the second playing time.

displaying a text information set in a video playing interface in response to a target adjusting operation triggered by the playing progress of a target video, wherein the text information set comprises at least one piece of text information, the target adjusting operation is used for determining a first playing time in the playing progress of the target video, and the text information set corresponds to the first playing time;

and in response to a selection operation triggered by the target text information in the at least one text information, starting to play the target video from a second playing time, wherein the second playing time is the playing time associated with the target text information.

Optionally, the target adjustment operation includes a dragging operation of dragging the video progress bar along the video progress bar direction in the video playing interface;

the selection operation triggered by aiming at the target text information in the at least one text information comprises a sliding operation performed in a direction perpendicular to the video progress bar.

Optionally, the displaying the text information set in the video playing interface includes:

and arranging and displaying the at least one text message in the video playing interface according to the playing time associated with the at least one text message.

Optionally, the method further comprises:

and displaying the character information which is respectively associated with the at least one text information in the target video in the video playing interface.

In one aspect, an embodiment of the present application provides a video playing apparatus, where the apparatus includes:

the adjustment response module is used for responding to target adjustment operation triggered by the playing progress of the target video and determining a first playing time of the adjusted target video;

an obtaining module, configured to obtain a text information set corresponding to the first playing time in the target video, where the text information set includes at least one text information;

the display module is used for displaying the at least one text message in a video playing interface;

and the video playing module is used for responding to a selection operation triggered by aiming at the target text information in the at least one text information, obtaining a second playing time corresponding to the target text information in the target video, and playing the target video from the second playing time.

Optionally, the display module is specifically configured to:

and in the video playing interface, arranging and displaying the at least one text message according to the playing time associated with the at least one text message.

Optionally, the display module is specifically configured to:

Optionally, the obtaining module is specifically configured to:

acquiring a first preview image and a text information set corresponding to the first playing time in the target video;

the display module is specifically configured to:

and displaying the first preview image and the at least one text message in a video playing interface according to the time sequence association relationship between the at least one text message and the first preview image.

Optionally, the display module is specifically configured to:

displaying the at least one text message in a first area in the video playing interface according to the time sequence incidence relation between the at least one text message and the first preview image;

and displaying the first preview image in a second area in the video playing interface.

Optionally, the display module is further configured to:

in the first area, the character information which is respectively associated with the at least one text information in the target video is shown.

And displaying the character information associated with the first preview image in the target video in the second area.

Optionally, the text information set includes a first text information played at the first play time, and at least two pieces of second text information played before and after the first play time.

Optionally, the obtaining module is further configured to:

responding to a target adjustment operation triggered by the playing progress of a target video, and after determining a first playing moment of the adjusted target video, acquiring at least one second preview image, wherein the playing time interval between the second preview image and the first preview image meets a preset condition, and the similarity between the second preview image and the first preview image is smaller than a preset threshold;

the display module is further configured to:

and displaying the at least one second preview image in the second area according to the time sequence incidence relation between the at least one second preview image and the first preview image.

Optionally, the display module is further configured to:

in the second area, the personal information respectively associated with the at least one second preview image in the target video is shown.

Optionally, the video playing module is further configured to:

according to the time sequence incidence relation between at least one piece of text information in the text information set and the first preview image, after the at least one piece of text information is displayed in a video playing interface, responding to a selection operation triggered by a target preview image in the preview image set, obtaining a third playing time corresponding to the target preview image in the target video, and playing the target video from the third playing time, wherein the preview image set comprises the first preview image and the at least one second preview image.

Optionally, the video playing module is further configured to:

and according to the time sequence association relationship between at least one text message in the text message set and the first preview image, after the at least one text message is displayed in a video playing interface, if no operation is responded within a preset time length, starting to play the target video from the first playing time.

Optionally, the target adjustment operation includes a dragging operation of dragging the video progress bar in a video playing interface along a direction of the video progress bar, where a maximum amount of text information included in the text information set is an amount of text information corresponding to a video clip skipped by the target video when the video progress bar is dragged by a unit distance, and the selection operation triggered by the target text information in the at least one piece of text information includes a sliding operation performed in a direction perpendicular to the direction of the video progress bar.

the adjustment response module is used for responding to a target adjustment operation triggered by the playing progress of a target video and displaying a text information set in a video playing interface, wherein the text information set comprises at least one piece of text information, the target adjustment operation is used for determining a first playing time in the playing progress of the target video, and the text information set corresponds to the first playing time;

and the video playing module is used for responding to a selection operation triggered by the target text information in the at least one text information and playing the target video from a second playing time, wherein the second playing time is the playing time associated with the target text information.

Optionally, the adjustment response module is specifically configured to:

and displaying the personal information which is respectively associated with the at least one text message in the target video in the video playing interface.

In one aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the video playing method when executing the program.

In one aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program executable by a computer device, and when the program runs on the computer device, the computer device is caused to execute the steps of the video playing method.

In the embodiment of the application, the first playing time is obtained based on the target adjustment operation triggered by the user according to the playing progress of the target video, and then the text information set corresponding to the first playing time is obtained and displayed, so that the user can obtain the video content of the target video near the first playing time from the text information set. And then, by selecting the target text information in the text information set, finely positioning to a second playing time in the target video required by the user, and playing the target video from the second playing time, so that the efficiency and the accuracy of adjusting the video watching progress are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;

fig. 2 is a schematic view of a video playing interface according to an embodiment of the present disclosure;

fig. 3 is a schematic view of a video playing interface according to an embodiment of the present disclosure;

fig. 4a is a schematic view of a video playing interface according to an embodiment of the present disclosure;

fig. 4b is a schematic view of a video playing interface according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a video playing method according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating a video progress adjustment according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of adjusting video progress according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating speech-line information according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating speech-line information according to an embodiment of the present disclosure;

fig. 10a is a schematic diagram illustrating speech-line information according to an embodiment of the present application;

fig. 10b is a schematic diagram illustrating speech-line information according to an embodiment of the present application;

fig. 10c is a schematic diagram illustrating speech-line information according to an embodiment of the present application;

fig. 10d is a schematic diagram illustrating speech-line information according to an embodiment of the present application;

fig. 11a is a schematic diagram illustrating speech-line information according to an embodiment of the present disclosure;

fig. 11b is a schematic diagram illustrating adjustment of video progress according to an embodiment of the disclosure;

fig. 12 is a schematic diagram illustrating speech-line information and a preview image according to an embodiment of the present application;

fig. 13 is a schematic diagram illustrating speech information and a preview image according to an embodiment of the present application;

fig. 14 is a schematic diagram illustrating speech information and a preview image according to an embodiment of the present application;

fig. 15 is a schematic diagram illustrating speech-line information and a preview image according to an embodiment of the present application;

fig. 16a is a schematic diagram illustrating speech information and a preview image according to an embodiment of the present application;

fig. 16b is a schematic view of a video playing interface provided in an embodiment of the present application;

fig. 17 is a schematic diagram illustrating speech-line information and a preview image according to an embodiment of the present application;

fig. 18 is a schematic diagram illustrating speech information and a preview image according to an embodiment of the present application;

fig. 19 is a schematic flowchart of a video playing method according to an embodiment of the present application;

fig. 20a is a schematic diagram illustrating speech-line information and a preview image according to an embodiment of the present application;

fig. 20b is a schematic diagram illustrating speech information and a preview image according to an embodiment of the present application;

fig. 21 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present application;

fig. 22 is a schematic structural diagram of a video playback device according to an embodiment of the present application;

fig. 23 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

For convenience of understanding, terms referred to in the embodiments of the present invention are explained below.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best human-computer interaction modes in the future.

Speech Recognition technology (Automatic Speech Recognition): a technology for converting human voice into text relates to multiple disciplines such as acoustics, phonetics and linguistics, and the voice recognition technology is widely integrated into daily life at present. In the embodiment of the application, the text information in the target video can be obtained through a voice recognition technology.

Voiceprint: the voice recognition method is one of the biological recognition characteristics, is extracted when a speaker vocalizes, and can be used as the characterization and identification of the speaker.

The lines: which refers to the text data corresponding to the sound information in the video.

The following is a description of the design concept of the embodiments of the present application.

At present, when a user watches a video, the watching progress of the video can be adjusted according to needs, and only interested video clips are watched. For the viewing progress adjustment, the relevant video platform provides a video progress bar. The user can slide the video progress bar to any time point in the video to realize the video progress adjustment.

However, the method of sliding the video progress bar is difficult to accurately locate the video segment required by the user, for example, when the screen of the mobile terminal device watching the video is small, or the video duration is long, the user dragging the video progress bar in a small range may correspond to several minutes of video content. At this time, the user often needs to slide the video progress bar several times or naturally transition to a desired video clip by viewing an unnecessary video clip, thereby resulting in inefficient adjustment of the video viewing progress.

Through analysis and discovery, the user can intuitively know the video content corresponding to the current playing time by the preview image of the video at the current playing time. However, some videos may have a shot scene for a long time, for example, a lecture-like video has a shot scene of a speaker for a long time, so that the difference between preview images at multiple playing moments is small, and it is difficult for a user to locate a desired video segment based on the preview image alone.

Further analysis shows that although some videos may keep a shot scene for a long time, the speech lines corresponding to different playing time are different. For example, in a speech video, a long time is a shot scene of a speaker, but speech contents corresponding to different playing times are different. If the speech is based on the video, the user can more accurately know the content of the video at each moment, so that the required video segment can be positioned.

In view of this, an embodiment of the present application provides a video playing method, where the method includes: and responding to a target adjustment operation triggered by the playing progress of the target video, and determining a first playing time of the adjusted target video. And then acquiring a text information set corresponding to the first playing time in the target video, wherein the text information set comprises at least one text information. And displaying at least one text message in the video playing interface. And responding to a selection operation triggered by the target text information in the at least one text information, obtaining a second playing time corresponding to the target text information in the target video, and starting to play the target video from the second playing time.

In the embodiment of the application, the first playing time is obtained based on the target adjustment operation triggered by the user according to the playing progress of the target video, and then the text information set corresponding to the first playing time is obtained and displayed, so that the user can know the video content of the target video near the first playing time from the text information set. And then, by selecting the target text information in the text information set, the second playing time in the target video required by the user is finely positioned, and the target video is played from the second playing time, so that the efficiency and the accuracy of adjusting the video watching progress are improved.

Referring to fig. 1, a system architecture diagram applicable to the embodiment of the present application is shown, and the system architecture diagram at least includes a terminal device 101 and a server 102.

The terminal device 101 is pre-installed with a video application, wherein the video application includes a video playing application, a short video application, a live broadcast application, and the like, and the type of the video application includes a client application, a web page application, an applet application, and the like. Terminal device 101 may include, among other things, one or more processors 1011, memory 1012, I/O interface 1013 to interact with server 102, and display panel 1014. The terminal device 101 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted device, and the like, but is not limited thereto.

The server 102 is a backend server for video applications, and the server 102 may include one or more processors 1021, memory 1022, and I/O interface 1023 to interact with the terminal device 101, among other things. In addition, server 102 may also configure database 1024. The server 102 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data, and an artificial intelligence platform. The terminal device 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The terminal device 101 determines a first play time of the adjusted target video in response to a target adjustment operation triggered for the play progress of the target video in the video application. And then, acquiring a text information set corresponding to a first playing time in the target video from the server 102, wherein the text information set comprises at least one text information. The terminal device 101 presents at least one text message in a video playing interface of the video application. The terminal device 101 responds to a selection operation triggered by the target text information in the at least one text information in the video application, obtains a second playing time corresponding to the target text information in the target video, and starts playing the target video from the second playing time.

Illustratively, as shown in fig. 2, the video playing interface of the video application is that the currently played target video in the video application is a football game video, and the current video progress bar 201 is located at the playing time "31 minutes and 21 seconds".

The user slides the video progress bar 201 to the playing time "54 minutes 20 seconds", the terminal device 101 determines the adjusted playing time "54 minutes 20 seconds" in response to a target adjustment operation triggered in the video application for the playing progress of the football game video, and acquires a first preview image and a corresponding text information set corresponding to the playing time "54 minutes 20 seconds" in the football game video from the server 102, wherein the playing time corresponding to the first preview image is 54 minutes 20 seconds. The text information set comprises { team a is the first to approach the ball, the score is 1 on the scene, and team B may have pressure in the case of losing one ball }, wherein team a is the first to approach the ball in text information 1, the corresponding playing time is 54 minutes 18 seconds, team a is the first to approach the ball in text information 2, the corresponding playing time is 54 minutes 20 seconds, and team B may have pressure in the case of losing one ball, the corresponding playing time is 54 minutes 22 seconds.

The terminal device 101 displays at least one text message in a video playing interface of the video application according to the playing time associated with the at least one text message, and displays the first preview image at the same time. At this time, as shown in fig. 3, the video playing interface of the video application is that the video progress bar 201 is located at the playing time "54 minutes and 20 seconds", and text information 1, text information 2, and text information 3 are displayed in an arrangement manner in the first area 301 of the video playing interface, where the font size of the text information 2 is greater than the font sizes of the text information 1 and the text information 3, and the text information 2 is located at the target selection position 303. The first preview image and the playing time "54 minutes 20 seconds" of the first preview image are shown in the second area 302 of the video playing interface.

The user performs a downward sliding operation in the first area 301 to slide the text information 1 to the target selection position 303, as shown in fig. 4 a. In response to the selection operation triggered for the text information 1 in the video application, the terminal device 101 obtains that the playing time of the text information 1 in the soccer game video is "54 minutes 18 seconds", and starts playing the soccer game video from 54 minutes 18 seconds, as shown in fig. 4 b.

Based on the system architecture diagram shown in fig. 1, an embodiment of the present application provides a flow of a video playing method, as shown in fig. 5, the flow of the method is executed by a computer device, where the computer device may be the terminal device 101 shown in fig. 1, and the method includes the following steps:

step S501, in response to a target adjustment operation triggered according to a playing progress of a target video, determining a first playing time of an adjusted target video.

Specifically, the target adjustment operation may be a dragging operation of dragging the video progress bar along a video progress bar direction in the video playing interface, where the direction of dragging the video progress bar corresponds to the direction of video playing adjustment, and a dragging distance of the video progress bar corresponds to the length of the skipped video. In specific implementation, the duration of a video clip of a video jump is determined when the video progress bar is dragged by a unit distance according to the total duration of the video and the total length of the dragging range of the video progress bar.

Illustratively, on a terminal device with 390 pixels × 844 pixels, the total length of the dragging range of the video progress bar is 844 pixels, and each pixel corresponds to a video segment. And if the total video time of the target video is 2 hours, dragging one pixel by the video progress bar, and enabling the time of the video clip of the target video to jump to be about 8.5 seconds.

As shown in fig. 6, setting the current playing time to 35 minutes and 18 seconds, the user may drag two pixels of the video progress bar to the right to jump to the first playing time "35 minutes and 35 seconds" after 35 minutes and 18 seconds of the target video, and the user may also drag two pixels of the video progress bar to the left to jump to the first playing time "35 minutes and 01 seconds" before 35 minutes and 18 seconds of the target video.

The target adjustment operation can also be a sliding operation directly performed on the video playing interface, the sliding direction corresponds to the video playing adjustment direction, and the sliding distance corresponds to the jumped video length.

For example, as shown in fig. 7, the current playing time is set to 35 minutes and 18 seconds, the user may perform a right sliding operation on the video playing interface to jump the target video to the first playing time after 35 minutes and 18 seconds, and the user may also perform a left sliding operation on the video playing interface to jump the target video to the first playing time before 35 minutes and 18 seconds.

The target adjustment operation may also be to operate a fast forward or fast backward button for controlling the video playing progress, where the fast forward or fast backward button may be a virtual button of the video playing interface or a hardware button on the terminal device. It should be noted that the target adjustment operation may also be an operation of other settings, and is not described herein again.

Step S502, a text information set corresponding to a first playing time in the target video is obtained. And the text information in the text information set is the video text in the target video. The video text may be lines, captions, banners, narrative text, and the like. A text message may refer to a sentence, a word, a paragraph, etc. One text message may be associated with a video segment in the target video, for example, the text message may be a word of a character played in the video segment, or a narration text of the content of the video segment, and the video segment may include at least one video frame. All video texts in the target video can be obtained through speech recognition or image recognition or manually, and the speech recognition method in the embodiment of the present application includes, but is not limited to, using a mixed gaussian model (GMM) and a Hidden Markov Model (HMM), or an acoustic model based on deep learning, such as a recurrent neural network, an encoding-decoding framework, an attention mechanism, and the like. For example, videos such as dramas, anaglyphs, movies, and the like have less audio noise, and a speech recognition technology based on deep learning can be used in combination with a manual correction mode to generate a speech-line data list of the videos, where the speech-line data format can be expressed as { "time point": the speech-line information in the speech-line data list is sequentially sorted according to time points.

In some embodiments, the text information may be associated with a time point or a time period in the target video at which the video segment associated with the text information is played, the time point associated with the text information may be a playing time of the text information, or a start, an end, or any one of the time points associated with the text information may be a playing time of the text information.

For example, in a drama scene, a drama character is saying a sentence in the video segment of the target video (35 min 01 sec to 35 min 25 sec of the target video), and the sentence may be associated with a time period (35 min 01 sec to 35 min 25 sec). When the user adjusts the progress to a certain time (namely a first playing time) between 35 minutes 01 seconds and 35 minutes 25 seconds of the target video, and the first playing time is determined to be within the time period associated with the clause, the clause associated with the time period is determined to be the text information corresponding to the first playing time.

At least one text message may be included in the set of text messages, and each text message of the at least one text message may be associated with a first play time. In some embodiments, each text message in the text message set may be a predetermined number of text messages adjacent to the text message corresponding to the first playing time in the playing timing, for example, the text message set may include at least one of the text message corresponding to the first playing time, M text messages located before the text message corresponding to the first playing time in the playing timing, and M text messages located after the text message corresponding to the first playing time in the playing timing, where M is greater than or equal to 1, and may be set according to the height of the display terminal. In this embodiment, the amount of text information in the text information set (i.e., the size of M) may be dynamically adjusted according to the video length. It can be understood that the longer the video length is, the lower the accuracy of the progress adjustment through the transverse video progress bar is, so that the number of the text information in the text information set can be dynamically adjusted according to the video length, and the longer the video length is, the more the number of the text information in the text information set is, and the larger the accurate adjustment range is.

In further embodiments, the difference between the point in time or the time period associated with each text information included in the set of text information and the first playing time is within a predetermined threshold. For example, the text information set may include at least one of text information corresponding to a first play time, text information associated with a time interval preceding the first play time in play timing, and text information associated with a time interval following the first play time in play timing, where the time interval may include at least one time period, and each of the at least one time period is associated with text information. In this embodiment, the time interval can be dynamically adjusted according to the video length. It can be understood that the longer the video length is, the lower the accuracy of the progress adjustment through the horizontal video progress bar is, so that the number of the text information in the text information set can be dynamically adjusted according to the video length, and the longer the video length is, the longer the time interval is, the greater the number of the text information in the text information set is, thereby increasing the time interval which can be accurately adjusted.

Optionally, when the target adjustment operation is a dragging operation of dragging the video progress bar along the video progress bar direction in the video playing interface, the maximum text information quantity contained in the text information set is the text information quantity corresponding to the video clip jumped by the target video when the video progress bar is dragged by a unit distance.

Specifically, when the video progress bar is dragged by a unit distance, the target time length of a video segment jumped by a target video is determined, then the target video is divided into a plurality of video segments based on the target time length, and the quantity of text information in each video segment is determined. And taking the average value of the text information quantity of each video clip as the maximum text information quantity contained in the text information set. Of course, the maximum text information quantity contained in the text information set may also be the text information quantity corresponding to the video clip skipped by the target video when the video progress bar is dragged by the preset distance, and thus, the present application is not particularly limited. Optionally, in this embodiment, when the average value of the number of text messages of each video clip exceeds a threshold, one text message may be extracted as the text message in the text message set every L text messages according to the playing timing sequence of the target video, where the threshold may be determined according to the height of the display interface.

Illustratively, for a target video with a total duration of 2 hours, on a terminal device with 390 pixels × 844 pixels, the total length of the dragging range of the video progress bar is 844 pixels, and the duration of the video segment corresponding to each pixel is about 8.5 seconds. And calculating the average speech-line number of the video clips of 8.5 seconds in the target video, and then taking the average speech-line number as the maximum speech-line information number contained in the speech-line information set.

Step S503, at least one text message is displayed in the video playing interface.

Specifically, the text information before the first playing time and the text information after the first playing time may be separately displayed by taking the first playing time as a boundary, or at least one text information may be displayed in an arrangement manner in the video playing interface according to the playing time associated with the at least one text information. In addition, only part of text information in the text information set can be displayed, other text information is in a hidden state, and a user can display the text information in the hidden state on a video playing interface by dragging a line progress bar perpendicular to the direction of the video progress bar or directly performing sliding operation in the direction perpendicular to the direction of the video progress bar. The display style of each text message can be set according to actual conditions, and specifically can be set from the aspects of fonts, colors, word sizes, display angles, display positions, display modes and the like.

Exemplarily, the text information set is set as a speech information set, which includes five speech information sets, respectively: { first time point "35 min 18 sec": first-term information "team a has taken the lead" }, { second time point "35 min 20 sec": second-term information "score on the scene is 1" }, { third time point "35 min 22 sec": the third wording message "team B may have pressure" in case of losing a ball, { fourth time point "35 min 24 sec": the fourth word information "team B needs to be refueled" }, { fifth time point "35 minutes 26 seconds": the fifth word information "team B starts attacking" }.

In a possible implementation manner, as shown in fig. 8, in the video playing interface, the first speech information, the second speech information, the third speech information, the fourth speech information, and the fifth speech information are sequentially displayed from left to right.

One possible implementation manner is that, as shown in fig. 9, on the left side of the video playing interface, the first speech information, the second speech information, the third speech information, the fourth speech information, and the fifth speech information are displayed in sequence from top to bottom.

In a possible implementation manner, as shown in fig. 10a, on the left side of the video playing interface, the second speech information, the third speech information, and the fourth speech information are sequentially displayed from top to bottom, and the first speech information and the fifth speech information are in a hidden state.

In response to the downward sliding operation of the user, the first speech information, the second speech information, and the third speech information are displayed on the video playing interface, and the fourth speech information and the fifth speech information are in a hidden state, as shown in fig. 10 b.

In response to an operation of dragging the vertical speech progress bar 1001 upward by the user, third speech information, fourth speech information, and fifth speech information are displayed on the video play interface, and the first speech information and the second speech information are in a hidden state, as shown in fig. 10 c.

Optionally, in the video playing interface, the personal information respectively associated with the at least one text message in the target video is displayed.

Specifically, the voiceprint features in the video can be extracted through a voiceprint model obtained through training, and then the figure information corresponding to each text message in the target video is determined based on the voiceprint features and a preset figure information table. The person information corresponding to each text information in the target video can also be determined in a manual mode. The character information may be a character name, a name of an entertainer, a sex, an age, etc. After obtaining the personal information, the personal information may be associated with the text information as part of a collection of text information. And when each piece of text information is displayed, the character information associated with each piece of text information is displayed at the same time.

Exemplarily, as shown in fig. 10d, on the left side of the video playing interface, second speech information, third speech information, fourth speech information, and corresponding character information are sequentially presented from top to bottom, where the character information of the second speech information, the third speech information, and the fourth speech information is "commentator a".

It should be noted that, in the embodiment of the present application, the forms of each text message in the video playing interface are not limited to the examples described above, and may also be other forms, and the present application is not limited specifically.

Step S504, responding to the selection operation triggered by the target text information in the text information, obtaining a second playing time corresponding to the target text information in the target video, and playing the target video from the second playing time.

Specifically, the selection operation may be a sliding operation performed in a direction perpendicular to the video progress bar, or a dragging of a line progress bar perpendicular to the video progress bar, or operations such as clicking, double-clicking, and long-pressing. In a specific implementation, a user can perform a target adjustment operation by using one finger, and after the target adjustment operation is finished, the finger is kept still, and a selection operation is performed by using the other finger. The user can also adopt one finger to carry out target adjustment operation, the finger leaves after the target adjustment operation is finished, the video application locks the first playing time after the target adjustment operation, and the user uses the finger to carry out selection operation. The second playing time may be the first playing time, or may be a playing time before or after the first playing time. Optionally, if no operation is responded within the preset time length, the target video is played from the first playing time.

As previously mentioned, in some embodiments, the target text information may be associated with a point in time, and the second playing time may be the point in time associated with the target text information. In other embodiments, the target text information may be associated with a time period, and the second playing time may be the beginning, the end, or any point in the time period associated with the target text information.

For example, the selection operation is set to be a sliding operation performed in a direction perpendicular to the video progress bar, and the speech information set is shown in fig. 10a and is not described herein again.

After the user drags the video progress bar to the right, as shown in fig. 11a, on the left side of the video playing interface, second speech information, third speech information, and fourth speech information are sequentially displayed from top to bottom, where the word size of the third speech information "that the team may have stressed" is larger than the word sizes of the other speech information under the condition that a ball is lost, and the third speech information is located at the target selection position 1101.

If the user performs a downward sliding operation on the video playing interface, and slides the second speech information "score 1 in the field" to the target selection position 1101, then the first speech information, the second speech information, and the third speech information are displayed on the video playing interface, where the word size of the second speech information is larger than the word size of the other speech information, as shown in fig. 11 b. If the playing time corresponding to the second speech information is 35 minutes 20 seconds, the target video is played from 35 minutes 20 seconds.

If the user does not perform any operation within the preset time length, the target video is played from the first playing time 35 to the second playing time 22.

Optionally, when the user performs a sliding operation or drags a speech-line progress bar on the video playing interface, the sliding distance corresponding to each piece of speech-line information is determined according to the maximum number of speech-line information in the speech-line information set and the total length of the sliding range. For example, if the maximum number of lines of the line information set is t, and the total length of the dragging range of the line progress bar is h pixels, the sliding distance s = h/t corresponding to each line of information.

Optionally, the preview image of the video at the current playing time can enable the user to intuitively know the video content corresponding to the current playing time, so that the preview image of the video and the lines in the video are displayed simultaneously, the user can conveniently know the content of the video at each time more accurately, and the required video segment can be accurately positioned. In view of this, in the embodiment of the present application, a first preview image and a text information set corresponding to a first play time in a target video are obtained, and the first preview image and at least one text information are displayed in a video play interface according to a time sequence association relationship between the at least one text information and the first preview image.

Specifically, the first preview image may be generated based on a video frame played at the first play time, or may be generated based on the video frame played at the first play time and a video frame before and/or after the first play time, where a time interval between the video frame before the first play time and the video frame after the first play time and the video frame played at the first play time is less than a first threshold. The first preview image is a preview image corresponding to a first playing time, and the time sequence association relationship refers to the relationship between the playing time of each text message and the first playing time. The first preview image may be associated with a time point or a time period, and if the first preview image is associated with a time point, the first playing time is the time point associated with the first preview image. If the first preview image is associated with a time period, the first playing time may be the beginning, the end, or any time point in the time period associated with the first preview image.

Optionally, the first preview image and each text message may be displayed in a partitioned manner, and the method specifically includes the following embodiments:

according to the first implementation mode, at least one piece of text information is displayed in a first area in a video playing interface according to the time sequence incidence relation between the at least one piece of text information and a first preview image, and the first preview image is displayed in a second area in the video playing interface.

Specifically, the first region and the second region may be two regions that do not intersect with each other, or may be two regions that partially intersect with each other. The shapes of the first area and the second area and the positions of the first area and the second area on the video playing interface can be set according to actual needs.

Illustratively, the target video is set to be a football game video, the user slides the video progress bar to a first playing time of "35 minutes 22 seconds", and the first playing time corresponding to the first preview image is "35 minutes 22 seconds". The text information set is a speech information set, which comprises five speech information, respectively: { first time point "35 min 18 sec": first-term information "team a has taken the lead" }, { second time point "35 min 20 sec": second-term information "score on the scene is 1" }, { third time point "35 min 22 sec": the third wording message "team B may have pressure" in case of losing a ball, { fourth time point "35 min 24 sec": the fourth word information "team B needs to be refueled" }, { fifth time point "35 min 26 sec": the fifth-line information "team B starts attacking" }.

As shown in fig. 12, second speech information, third speech information, and fourth speech information are sequentially displayed in the first area 1201 from top to bottom, and the first speech information and the fifth speech information are in a hidden state, where the word size of the third speech information is larger than the word size of the other speech information, and the third speech information is located in the target selection position 1203. The first preview image is presented in a second area 1202.

In the implementation of the application, the first preview image and the text information set are respectively displayed in different areas, so that a user can visually acquire the video content of the target video near the first playing moment from the first preview image and the text information set, and meanwhile, the user can conveniently select each text information in the text information set, the fine positioning of the video playing progress is realized, and the efficiency and the accuracy of adjusting the video watching progress are improved.

And according to the time sequence incidence relation between the at least one text message and the first preview image, displaying the at least one text message and the character information of the at least one text message in the target video in a first area in the video playing interface, and displaying the character information of the first preview image and the character information of the first preview image in the target video in a second area in the video playing interface.

Specifically, the voiceprint features in the video can be extracted through a voiceprint model obtained through training, and then the figure information corresponding to each text message in the target video is determined based on the voiceprint features and a preset figure information table. The person information corresponding to each text information in the target video can also be determined in a manual mode. The personal information may be a character name, a name of an entertainer, a sex, an age, etc. After obtaining the personal information, the personal information may be associated with the text information as part of a collection of text information. And when the first area shows each text message, the character information associated with each text message is shown at the same time. Similarly, the person information may be associated with the first preview image, and when the first preview image is displayed, the person information corresponding to the first preview image may be displayed at the same time.

Illustratively, the target video is set as a television play, the user slides the video progress bar to the first playing time of "35 minutes 22 seconds", and the character information associated with the first preview image is "li teacher" in the target video.

The text information set is a speech information set, which comprises five speech information, respectively: { first time point "35 min 18 sec": the first word information "classmates, today, tree-planting knots": character information "li teacher" }, { second time point "35 minutes 20 seconds": second-term information "school organization tree planting activity": character information "li teacher" }, { third time point "35 minutes 22 seconds": the third speech information "two people in one group go": character information "li mr" }, { fourth time point "35 min 24 sec": the fourth word information of "everybody notices safety": character information "li teacher" }, { fifth time point "35 minutes and 26 seconds": fifth-word information "good": the character information "student".

As shown in fig. 13, second speech information, third speech information, fourth speech information, and corresponding character information are sequentially displayed in a first area 1301 from top to bottom, and the first speech information and the fifth speech information are in a hidden state. The first preview image and the associated personal information of the first preview image are presented in a second area 1302. During display, the word size of the third line information is larger than the word sizes of the other line information, and the third line information is located in the target selection position 1303.

It should be noted that, in the embodiment of the present application, each piece of text information and the person information of each piece of text information, which is associated with each other in the target video, may be displayed in the first area of the video playing interface. And in a second area of the video playing interface, only the first preview image is displayed, and the character information related to the first preview image in the target video is not displayed. Or only showing each text message in a first area in the video playing interface, not showing the character information of each text message in the target video, and showing the first preview image and the character information of the first preview image in the target video in a second area in the video playing interface. The present application is not particularly limited thereto.

In the embodiment of the application, when the first preview image and the text information set are displayed, the first preview image and/or the text information set and the associated character information in the target video are displayed at the same time, so that a user can conveniently position the characters corresponding to the preview image or the text information, the user can conveniently select each text information in the text information set, the fine positioning of the video playing progress is realized, and the efficiency and the accuracy of adjusting the video watching progress are improved.

And according to the time sequence incidence relation between the at least one text message and the first preview image, displaying the at least one text message and the character information of the at least one text message in the target video in a first area of the video playing interface, and displaying the first playing time and the first preview image in a second area of the video playing interface.

For example, the speech-line information set and the first preview image are set, and are the same as the speech-line information set and the first preview image shown in fig. 13, and are not described again here.

As shown in fig. 14, second speech-line information, third speech-line information, fourth speech-line information, and corresponding character information are sequentially displayed in a first region 1401 from top to bottom, and the first speech-line information and the fifth speech-line information are in a hidden state. The first preview image and the play time "35 minutes 22 seconds" of the first preview image are shown in the second area 1402. When displaying, the word size of the third speech-line information is larger than the word sizes of the other speech-line information, and the third speech-line information is located in the target selection position 1403.

It should be noted that, in the embodiment of the present application, the playing time of each text message or the playing time of the text message located at the target selection position may also be displayed, which is not specifically limited in the present application.

In the embodiment of the application, when the first preview image and the text information set are displayed, the playing time of the first preview image is displayed, so that a user can visually know the playing time of the primarily adjusted target video conveniently. And meanwhile, the character information in the target video respectively associated with each text information in the text information set is displayed, so that the user can conveniently position the character of each text information in the target video, the user can conveniently select each text information in the text information set, the fine positioning of the video playing progress is realized, and the efficiency and the accuracy of adjusting the video watching progress are improved.

According to the time sequence incidence relation between the at least one text message and the first preview image, the at least one text message, the character information and the content label which are associated with the at least one text message in the target video are displayed in a first area in the video playing interface, and the character information and the content label which are associated with the first preview image and the first preview image in the target video are displayed in a second area in the video playing interface.

Specifically, the content tag is used for representing main content or bright spot content of the video segment in the target video, the target video may be divided into a plurality of video segments in advance, then the bright spot content of each video segment is determined through a neural network model or in a manual manner, and then the content tag is marked for each video segment based on the bright spot content. For any text message, firstly, a video clip corresponding to the text message is determined based on the playing time of the text message, and then, a content tag corresponding to the video clip is used as the content tag of the text message. Similarly, a video clip corresponding to the first preview image is determined based on the playing time of the first preview image, and then a content tag corresponding to the video clip is used as a content tag of the first preview image.

Illustratively, the target video is set as a television show, the user slides the video progress bar to the first playing time of 35 minutes and 22 seconds, the character information associated with the first preview image is 'li teacher' in the target video, and the content label associated with the first preview image is 'tree section'.

The text information set is a speech information set, which comprises five speech information, respectively: { first time point "35 min 18 sec": the first word information "classmates, today, tree-planting knots": character information "li-teacher": content label "treeing section" }, { second time point "35 minutes 20 seconds": second-line information "school organization tree planting activity": character information "li-teacher": content labels "tree section" }, { third time point "35 min 22 sec": the third-line information "two-person one-group proceeding": character information "li teacher: content labels "tree section" }, { fourth time point "35 min 24 sec": fourth-word information "everybody notices safety": character information "li-teacher": content labels "tree section" }, { fifth time point "35 minutes and 26 seconds": fifth-word information "good": character information "student": content label "treebergy".

As shown in fig. 15, second speech information, third speech information, fourth speech information, and corresponding personal information and content tags are sequentially presented in the first region 1501 from top to bottom. The first speech information and the fifth speech information are in a hidden state. The first preview image and associated persona information and content tags for the first preview image are presented in a second region 1502. During presentation, the word size of the third line information is larger than the word sizes of the other line information, and the third line information is located in the target selection position 1503.

In the embodiment of the application, when the first preview image and the text information set are displayed, the first preview image and/or the text information set, the associated character information and the content tag in the target video are displayed at the same time, so that a user can conveniently position characters corresponding to the preview image or the text information and the corresponding video content, the user can conveniently select each text information in the text information set, the fine positioning of the video playing progress is realized, and the efficiency and the accuracy of adjusting the video watching progress are improved.

In the embodiment of the present application, the embodiment of displaying the first preview image and each text message in a partitioned manner is not limited to the above-described embodiments, and may be other display forms. In addition, the content displayed is not limited to the character information, the playing time, and the content tag corresponding to the first preview image and the text information set, and may be content such as the user viewing amount and the user discussion amount, and the application is not limited thereto. In addition, after the first preview image and each text message are displayed in a partitioned manner by adopting any one of the above embodiments, a user can drag a line progress bar in the first area or directly perform a sliding operation in the first area, so that the text message in a hidden state is displayed on the video playing interface. Meanwhile, a second playing time can be obtained through selection operations such as clicking, double-clicking, long-pressing, sliding and the like, and the target video is played from the second playing time.

For example, after the user slides the horizontal video progress bar to the first playing time "35 minutes and 22 seconds", the video playing interface is as shown in fig. 14. A downward sliding operation is performed in the first region 1401, and the second speech information "school organization tree planting activity" is slid to the target selection position 1403, as shown in fig. 16a, the first speech information, the second speech information, and the third speech information are displayed on the video playback interface, where the word size of the second speech information is larger than that of the other speech information.

The terminal device acquires the play time "35 minutes 20 seconds" of the second speech information located at the target selection position 1403, and plays the drama from 35 minutes 20 seconds of the drama. As shown in fig. 16b, the video playing interface displays a 35 minute and 20 second video frame.

Optionally, in step S501, in response to a target adjustment operation triggered according to the playing progress of the target video, after determining a first playing time of the adjusted target video, at least one second preview image is obtained, where a playing time interval between the second preview image and the first preview image satisfies a preset condition, and a similarity between the second preview image and the first preview image is smaller than a preset threshold. And then, in the second area, displaying at least one second preview image according to the time sequence incidence relation between the at least one second preview image and the first preview image.

Specifically, the playing time interval between the second preview image and the first preview image satisfies the preset condition, and may be that the playing time interval between the second preview image and the first preview image is smaller than a preset threshold. The at least one second preview image may be a preview image played before or after the first play time, or a preview image played before and after the first play time.

Illustratively, the target video is set as a television play, after the user slides the video progress bar to the first playing time of "35 minutes 22 seconds", the terminal device obtains a first preview image and a second preview image, the first playing time corresponding to the first preview image is "35 minutes 22 seconds", and the first playing time corresponding to the second preview image is "35 minutes 26 seconds".

The text information set is a speech-line information set, which comprises five speech-line information, which are respectively: { first time point "35 min 18 sec": the first line information, "classmates, today tree planting sections": character information "li mr" }, { second time point "35 min 20 sec": second-term information "school organization tree planting activity": character information "li mr" }, { third time point "35 min 22 sec": the third-line information "two-person one-group proceeding": character information "li mr" }, { fourth time point "35 min 24 sec": the fourth word information of "everybody notices safety": character information "li mr" }, { fifth time point "35 min 26 sec": fifth-word information "good": the character information "student".

As shown in fig. 17, second speech information, third speech information, fourth speech information, and corresponding character information are sequentially displayed in the first area 1701 from top to bottom, the first speech information and the fifth speech information are in a hidden state, the word size of the third speech information is larger than that of the other speech information, and the third speech information is located in the target selection position 1703.

The first preview image and the second preview image are sequentially displayed in the second region 1702 from left to right, and the playing time corresponding to the first preview image and the playing time corresponding to the second preview image are respectively displayed.

In the embodiment of the application, when the first preview image and the text information set are displayed, at least one second preview image which is near the first preview image and has the similarity with the first preview image smaller than the preset threshold value is displayed at the same time, so that a user can more intuitively know the video content near the first playing time based on the first preview image, the second preview image and the text information set, the user can more accurately position the video playing progress, and the efficiency and the accuracy of adjusting the video watching progress are improved.

Optionally, in the second area, the personal information respectively associated with the at least one second preview image in the target video is displayed.

Illustratively, the target video is set as a television play, after the user slides the video progress bar to the first playing time of "35 minutes 22 seconds", the terminal device obtains a first preview image and a second preview image, the first playing time corresponding to the first preview image is "35 minutes 22 seconds", and the first playing time corresponding to the second preview image is "35 minutes 26 seconds". The character information associated with the first preview image is 'li teacher' in the target video, and the character information associated with the second preview image is 'student' in the target video. The speech information set is the same as the speech text set shown in fig. 17, and the description thereof is omitted.

As shown in fig. 18, second speech information, third speech information, fourth speech information, and corresponding character information are sequentially displayed in the first area 1701 from top to bottom, the first speech information and the fifth speech information being in a hidden state, wherein the word size of the third speech information is larger than the word sizes of the other speech information, and the third speech information is located in the target selection position 1703.

The first preview image and the second preview image, and the character information respectively associated with the first preview image and the second preview image are sequentially displayed from left to right in the second region 1702.

The applicant does not specifically limit the content of the second preview image, such as the playing time, the content tag, the user viewing amount, and the user discussion amount, rather than merely showing the character information related to the second preview image, while showing the second preview image.

In the embodiment of the application, when the first preview image and the text information set are displayed, at least one second preview image which is near the first preview image and has the similarity with the first preview image smaller than the preset threshold value is displayed at the same time, and the first preview image, the second preview image and the text information set are also displayed, and the character information which is associated with each first preview image, the second preview image and the text information set is displayed, so that a user can quickly locate the character corresponding to the preview image or the text information based on the character information, the user can conveniently select each text information in the text information set, the fine location of the video playing progress is realized, and the efficiency and the accuracy of adjusting the video watching progress are improved.

Optionally, since each second preview image corresponds to a playing time, the user may also adjust the video playing progress by selecting the second preview image. Specifically, in response to a selection operation triggered by a target preview image in a preview image set, a third playing time corresponding to the target preview image in the target video is obtained, and the target video is played from the third playing time, where the preview image set includes a first preview image and at least one second preview image.

Specifically, the selection operation may be a click, double click, long press, slide, or the like. The target preview image may be the first preview image or the second preview image. The third playback time may be the first playback time, or may be a playback time before or after the first playback time. As previously described, in some embodiments, the target preview image may be associated with a point in time, and the third play time may be the point in time associated with the target preview image. In other embodiments, the target preview image may be associated with a time period, and the third playing time may be the beginning, the end, or any point in the time period associated with the target preview image. Optionally, if no operation is responded within the preset time length, the target video is played from the first playing time.

Illustratively, on the basis of the video playing interface shown in fig. 17, the first playing time corresponding to the first preview image is "35 minutes 22 seconds", and the first playing time corresponding to the second preview image is "35 minutes 26 seconds". If the user clicks on the second preview image in the video playing interface, the video application starts playing the target video from 35 minutes to 26 seconds.

In the implementation of the application, each preview image corresponds to one playing time, so that the user can also adjust the video playing progress by selecting the preview image, and more modes for adjusting the video playing progress are provided for the user, so that the convenience for adjusting the video playing progress is improved.

Based on the system architecture diagram shown in fig. 1, an embodiment of the present application provides a flow of a video playing method, where the flow of the method is executed by a computer device, where the computer device may be the terminal device 101 shown in fig. 1, and the method includes the following steps:

and displaying a text information set in a video playing interface in response to a target adjusting operation triggered by the playing progress of the target video, wherein the text information set comprises at least one piece of text information, the target adjusting operation is used for determining a first playing time in the playing progress of the target video, and the text information set corresponds to the first playing time. And in response to a selection operation triggered by the target text information in the at least one text information, starting playing the target video from a second playing time, wherein the second playing time is the playing time associated with the target text information.

In one possible implementation manner, the target adjustment operation includes a dragging operation of dragging the video progress bar along the direction of the video progress bar in the video playing interface, and the selection operation triggered by the target text information in the at least one piece of text information includes a sliding operation performed perpendicular to the direction of the video progress bar.

Specifically, the direction of dragging the video progress bar corresponds to the direction of video playing adjustment, and the dragging distance of the video progress bar corresponds to the length of the skipped video. In specific implementation, the duration of a video clip of a video jump when the video progress bar is dragged by a unit distance is determined according to the total duration of the video and the total length of the dragging range of the video progress bar. The target adjustment operation may also be a sliding operation directly performed on the video playing interface, a fast forward or fast backward button operated to control the video playing progress, and the like. The selection operation triggered by the target text information in the at least one text information may also be dragging a speech progress bar perpendicular to the direction of the video progress bar.

In one possible implementation manner, in the video playing interface, at least one text message is arranged and displayed according to the playing time associated with the at least one text message.

In one possible implementation manner, in the video playing interface, the at least one text message is displayed in the target video, and the at least one text message is associated with the personal information in the target video.

Specifically, the voiceprint features in the video can be extracted through a voiceprint model obtained through training, and then the person information corresponding to each text message in the target video is determined based on the voiceprint features and a preset person information table. The corresponding character information of each text message in the target video can also be determined in a manual mode. The personal information may be a character name, a name of an entertainer, a sex, an age, etc. After obtaining the personal information, the personal information may be associated with the text information as part of a collection of text information. And when each piece of text information is displayed, the character information associated with each piece of text information is displayed at the same time.

In the embodiment of the application, the first playing time is obtained based on the target adjustment operation triggered by the user according to the playing progress of the target video, and then the text information set corresponding to the first playing time is obtained and displayed, so that the user can obtain the video content of the target video near the first playing time from the text information set. And then, by selecting the target text information in the text information set, the second playing time in the target video required by the user is finely positioned, and the target video is played from the second playing time, so that the efficiency and the accuracy of adjusting the video watching progress are improved.

To better explain the embodiment of the present application, a video playing method provided by the embodiment of the present application is introduced below with reference to specific implementation scenarios, where the method is performed by a user, a terminal device, and a server interactively, and includes the following steps, as shown in fig. 19:

in step S1901, the user clicks an icon of a target video in the video application of the terminal device.

In step S1902, the terminal device plays the target video.

In step S1903, the terminal device sends a speech data request of the target video to the server.

In step S1904, the server transmits speech-line data of the target video to the terminal device.

Specifically, the server extracts the sound information of the target video in an off-line mode, and then generates the speech information of the target video in a voice recognition and manual correction mode. Further extracting the voiceprint characteristics in the target video through a voiceprint model obtained through training, and identifying the character information corresponding to each sentence of speech information in the target video by matching with a preset character information table. The speech data format can be expressed as { "time point": "speech information": the figure information }, and the figure information is sequenced according to time points.

In step S1905, the terminal device locally loads and stores the speech-line data of the target video.

Specifically, the terminal device stores the speech-line data of the target video locally in a hard disk or a memory. Because the speech data are text data and occupy small space, the speech data are acquired and locally stored before the video progress bar slides, so that a large amount of storage space of the terminal equipment is not occupied, meanwhile, when the user slides the video progress bar, the corresponding speech data can be quickly acquired, and bandwidth resources are also prevented from being occupied when the user requests for previewing images in real time.

In step S1906, the user slides the video progress bar in the video playing interface laterally.

In step S1907, the terminal device obtains the first play time of the adjusted target video in response to the operation of sliding the video progress bar.

In step S1908, the terminal device transmits a preview image acquisition request to the server.

The preview image acquisition request carries a first playing time.

In step S1909, the server sends a first preview image corresponding to the first play time to the terminal device.

Step S1910, the terminal device obtains a speech information set corresponding to the first play time.

The maximum number of lines of speech information contained in the line of speech information set is set as the number of lines of speech information corresponding to the video clip jumped by the target video when the video progress bar slides a unit pixel.

In step S1911, the terminal device displays each speech information and associated character information in the speech information set in the first area, and displays the first preview image and the first play time in the second area.

In step S1912, the user selects one piece of target speech-line information from the pieces of speech-line information.

Step S1913, the terminal device acquires a second play time corresponding to the target speech information, and starts playing the target video from the second play time.

For example, setting the target video as a television play, the video duration is 2 hours, and on a terminal device with 390 pixels × 844 pixels, the video progress bar needs to be mapped to 844 video segments on 844 pixels. And skipping about 8.5 seconds for the target video every time the horizontal video progress slides by one pixel. Setting the average number of lines corresponding to a video clip of 8.5 seconds in a target video to be t, setting the average number of lines to be mapped within 390 pixels at most, and setting the vertical line progress bar to be dragged by a distance of 390/t pixels when rolling one line.

As shown in fig. 20a, the user drags the horizontal video progress bar 2001 to the first playing time "35 minutes 22 seconds", and obtains a first preview image corresponding to the first playing time, where the character information associated with the first preview image is "li teacher" in the target video. The speech information set comprises five pieces of speech information which are respectively: { first time point "35 min 18 sec": the first word information "classmates, today, tree-planting knots": character information "li mr" }, { second time point "35 min 20 sec": second-term information "school organization tree planting activity": character information "li teacher" }, { third time point "35 minutes 22 seconds": the third-line information "two-person one-group proceeding": character information "li mr" }, { fourth time point "35 min 24 sec": the fourth word information of "everybody notices safety": character information "li mr" }, { fifth time point "35 min 26 sec": fifth-word information "good": the character information "student".

Second speech-line information, third speech-line information, fourth speech-line information and corresponding character information are sequentially displayed in the first area 2002 from top to bottom, and the first speech-line information and the fifth speech-line information are in a hidden state. A first preview image and a first play time are shown in a second area 2003. When displayed, the word size of the third line information is larger than the word sizes of the other line information, and the third line information is located in the target selection position 2004.

The user drags the vertical speech progress bar 2005 downward in the first area 2002, and slides the second speech information "school organization tree planting activity" to the target selection position 2004. As shown in fig. 20b, the first speech information, the second speech information, and the third speech information are displayed on the video playing interface, wherein the word size of the second speech information is larger than that of the other speech information.

The terminal device acquires the play time "35 minutes 20 seconds" of the second speech information located at the target selection position 2004, and plays the drama from 35 minutes 20 seconds of the drama, and the video play interface displays a video screen of 35 minutes 20 seconds.

In the embodiment of the application, when a user drags the video progress bar, the lines information before and after the moment of the video progress bar is dynamically displayed, and the video content corresponding to the moment of the video progress bar is quickly known through line preview. Compared with a video preview, the method has the advantages that the speech information is obtained in advance and stored locally, the speech information can be displayed in real time without any delay, and the situation that video content is difficult to see in many scenes is avoided. In addition, through the mode that the vertical progress bar of the lines of speech drags, when having guaranteed that horizontal video progress bar adjusts the progress by a wide margin, realize the progress adjustment in minute level minizone, improved the efficiency and the accuracy that the progress was watched to the adjustment video, provide the meticulous progress of user and pull and experience, effectively promoted user's dramatic experience of watching under scenes such as user review, fast forward video.

Based on the same technical concept, an embodiment of the present application provides a video playing apparatus, as shown in fig. 21, where the apparatus 2100 includes:

an adjustment response module 2101, configured to determine a first playing time of an adjusted target video in response to a target adjustment operation triggered with respect to a playing progress of the target video;

an obtaining module 2102, configured to obtain a text information set corresponding to the first playing time in the target video, where the text information set includes at least one piece of text information;

a presentation module 2103, configured to present the at least one text message in a video playing interface;

the video playing module 2104 is configured to, in response to a selection operation triggered by a target text message in the at least one text message, obtain a second playing time of the target text message in the target video, and start playing the target video from the second playing time.

Optionally, the presentation module 2103 is specifically configured to:

Optionally, the display module 2103 is specifically configured to:

Optionally, the obtaining module 2102 is specifically configured to:

the display module 2103 is specifically configured to:

Optionally, the display module 2103 is specifically configured to:

Optionally, the presentation module 2103 is further configured to:

and in the first area, showing the character information which is respectively associated with the at least one text information in the target video.

Optionally, the obtaining module 2102 is further configured to:

responding to a target adjustment operation triggered by the playing progress of a target video, and after determining a first playing time of the adjusted target video, acquiring at least one second preview image, wherein the playing time interval between the second preview image and the first preview image meets a preset condition, and the similarity between the second preview image and the first preview image is smaller than a preset threshold;

the presentation module 2103 is further configured to:

Optionally, the presentation module 2103 is further configured to:

and displaying the personal information respectively associated with the at least one second preview image in the target video in the second area.

Optionally, the video playing module 2104 is further configured to:

and according to the time sequence incidence relation between at least one piece of text information in the text information set and the first preview image, after the at least one piece of text information is displayed in a video playing interface, responding to a selection operation triggered by a target preview image in the preview image set, obtaining a third playing time corresponding to the target preview image in the target video, and playing the target video from the third playing time, wherein the preview image set comprises the first preview image and the at least one second preview image.

Optionally, the video playing module 2104 is further configured to:

and according to the time sequence incidence relation between at least one text message in the text message set and the first preview image, after the at least one text message is displayed in a video playing interface, if no operation is responded within a preset time length, the target video is played from the first playing time.

In the embodiment of the application, the first playing time is obtained based on the target adjustment operation triggered by the user according to the playing progress of the target video, and then the first preview image and the corresponding text information set corresponding to the first playing time are obtained and displayed, so that the user can intuitively obtain the video content of the target video near the first playing time from the first preview image and the text information set. And then, by selecting the target text information in the text information set, the second playing time in the target video required by the user is finely positioned, and the target video is played from the second playing time, so that the efficiency and the accuracy of adjusting the video watching progress are improved.

Based on the same technical concept, an embodiment of the present application provides a video playing apparatus, as shown in fig. 22, the apparatus 2200 includes:

an adjustment responding module 2201, configured to, in response to a target adjustment operation triggered by a playing progress of a target video, display a text information set in a video playing interface, where the text information set includes at least one piece of text information, and the target adjustment operation is used to determine a first playing time in the playing progress of the target video, where the text information set corresponds to the first playing time;

a video playing module 2202, configured to, in response to a selection operation triggered for a target text message in the at least one text message, start playing the target video from a second playing time, where the second playing time is a playing time associated with the target text message.

Optionally, the adjustment response module 2201 is specifically configured to:

Based on the same technical concept, the embodiment of the present application provides a computer device, which may be a terminal or a server, as shown in fig. 23, including at least one processor 2301 and a memory 2302 connected with the at least one processor, where the specific connection medium between the processor 2301 and the memory 2302 is not limited in this embodiment, and the processor 2301 and the memory 2302 are connected through a bus in fig. 23 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.

In the embodiment of the present application, the memory 2302 stores instructions executable by the at least one processor 2301, and the at least one processor 2301 can execute the steps included in the video playing method by executing the instructions stored in the memory 2302.

The processor 2301 is a control center of the computer device, and can be connected to various parts of the computer device by using various interfaces and lines, and controls video playing by executing or executing instructions stored in the memory 2302 and calling data stored in the memory 2302. Alternatively, the processor 2301 may include one or more processing units, and the processor 2301 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, and the like, and a modem processor, which mainly handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into the processor 2301. In some embodiments, the processor 2301 and the memory 2302 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 2301 may be a general-purpose processor such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The memory 2302 is one type of non-volatile computer-readable storage medium that may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 2302 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 2302 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 2302 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function, and may be used for storing program instructions and/or data.

Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which, when the program runs on the computer device, causes the computer device to execute the steps of the video playing method described above.

It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A video playback method, comprising:

displaying the at least one text message in a video playing interface;

2. The method of claim 1, wherein said presenting said at least one text message in a video playback interface comprises:

3. The method of claim 2, further comprising:

4. The method according to claim 1, wherein the obtaining a text information set corresponding to the first playing time in the target video and displaying the at least one text information in a video playing interface comprises:

and displaying the first preview image and the at least one text message in a video playing interface according to the time sequence incidence relation between the at least one text message and the first preview image.

5. The method of claim 4, wherein said presenting the first preview image and the at least one text message in a video playback interface in accordance with the respective chronological associations of the at least one text message with the first preview image comprises:

6. The method of claim 5, wherein said presenting the at least one text message in a first area in the video playback interface further comprises:

in the first area, showing the character information which is respectively associated with the at least one text information in the target video;

the displaying the first preview image in the second area of the video playing interface further comprises:

and displaying the character information of the first preview image in the target video in the second area.

7. The method of claim 5, wherein after determining the adjusted first play time of the target video in response to a target adjustment operation triggered by a play progress for the target video, further comprising:

acquiring at least one second preview image, wherein the playing time interval between the first preview image and the second preview image meets a preset condition, and the similarity between the first preview image and the second preview image is smaller than a preset threshold value;

8. The method of claim 7, wherein said presenting the at least one second preview image further comprises:

9. The method of claim 7, wherein after presenting the first preview image and the at least one text message in a video playback interface in accordance with the at least one text message in respective chronological association with the first preview image, further comprising:

responding to a selection operation triggered by a target preview image in a preview image set, obtaining a third playing time of the target preview image in the target video, and playing the target video from the third playing time, wherein the preview image set comprises the first preview image and the at least one second preview image.

10. The method of claim 7, wherein after presenting the first preview image and the at least one text message in a video playback interface in accordance with the respective chronological associations of the at least one text message with the first preview image, further comprising:

and if no operation is responded within a preset time length, starting to play the target video from the first play time.

11. The method according to any one of claims 1 to 10, wherein the target adjustment operation includes a drag operation of dragging a video progress bar in a video playing interface along a video progress bar direction, the maximum amount of text information included in the text information set is the amount of text information corresponding to a video clip to which the target video jumps when the video progress bar is dragged by a unit distance, and the selection operation triggered for the target text information in the at least one text information includes a sliding operation performed perpendicular to the video progress bar direction.

12. A video playback method, comprising:

13. A video playback apparatus, comprising:

the adjustment response module is used for responding to target adjustment operation triggered by the playing progress of the target video and determining the first playing time of the adjusted target video;

the acquisition module is used for acquiring a text information set corresponding to the first playing time in the target video, wherein the text information set comprises at least one text information;

and the video playing module is used for responding to a selection operation triggered by the target text information in the at least one text information, obtaining a second playing time corresponding to the target text information in the target video, and starting to play the target video from the second playing time.

14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 12 are implemented by the processor when executing the program.

15. A computer-readable storage medium, having stored thereon a computer program executable by a computer device, for causing the computer device to perform the steps of the method of any one of claims 1 to 12, when the program is run on the computer device.