WO2020108248A1 - 一种视频播放方法及装置 - Google Patents

一种视频播放方法及装置 Download PDF

Info

Publication number
WO2020108248A1
WO2020108248A1 PCT/CN2019/115889 CN2019115889W WO2020108248A1 WO 2020108248 A1 WO2020108248 A1 WO 2020108248A1 CN 2019115889 W CN2019115889 W CN 2019115889W WO 2020108248 A1 WO2020108248 A1 WO 2020108248A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
server
terminal device
hitting
request
Prior art date
Application number
PCT/CN2019/115889
Other languages
English (en)
French (fr)
Inventor
张文杰
王莽
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19891244.6A priority Critical patent/EP3876543A4/en
Publication of WO2020108248A1 publication Critical patent/WO2020108248A1/zh
Priority to US17/333,015 priority patent/US20210289266A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/251Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26208Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4882Data services, e.g. news ticker for displaying messages, e.g. warnings, reminders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/64Addressing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Definitions

  • This application relates to the field of communication technology, and in particular, to a video playback method and device.
  • video platforms are usually more exciting Annotate the position, for example, make a dot on the progress bar of the video to form multiple dot positions.
  • the text information of the video content at the dot position will be displayed at the dot position. This helps the user to switch to the position he wants to watch in a shorter period of time, and also ensures that the user can quickly discover the more exciting parts of the video.
  • the displayed text information is usually a relatively short sentence.
  • the content that these short sentences can express is limited, so that the user cannot compare I have a good understanding of the video content, and the text information does not summarize the video content very well, affecting the user experience.
  • the present application provides a video playback method and device to solve the problem that the text information displayed in the prior art cannot well summarize the video content and affect the user experience.
  • an embodiment of the present application provides a video playback method.
  • the method includes: first, a server receives a first request from a terminal device, where the first request is used to request a video to be played by the terminal device Address; after receiving the first request, the server may determine the video address and the hitting information of the video, and then carry the video address and the hitting information in the first response, and then The terminal device sends a first response, and the hitting information includes a hitting position of the video and a storage address of a video segment corresponding to the hitting position.
  • the terminal device when the terminal device needs to play a video, it can interact with the server to obtain a video address, and at the same time obtain the hitting information of the video, and the hitting information includes a video segment corresponding to the hitting position of the video Storage address, the terminal device can play the video clip at the dotted position when playing the video, can more intuitively display the video content at the dotted position, and can effectively improve the user experience.
  • the server sends a first response to the terminal device, and the terminal device may interact with the server to obtain a video clip corresponding to the hitting position.
  • the server may first Receiving a second request sent by the terminal device, where the second request is used to request a video segment corresponding to the hitting location, and the second request includes a storage address of the video segment corresponding to the hitting location; after that, the The server obtains the video segment corresponding to the hitting location according to the storage address of the video segment corresponding to the hitting location, and then sends a second response to the terminal device, where the second response includes the video segment corresponding to the hitting location.
  • the terminal device when the terminal device needs to obtain a video clip, it can interact with the server to obtain the video clip more flexibly and conveniently according to the storage address of the video clip corresponding to the dot location.
  • the server before the server sends the first response to the terminal device, the server needs to determine the hit information, that is, determine the hit location and the video segment corresponding to the hit location.
  • the specific process is as follows: the server first divides the video into multiple video segments; after that, the server determines the highlight of each video segment based on a preset neural network model; N video clips are selected for the wonderful degree; after N video clips are selected, the server determines N dot positions of the video according to the positions of the N video clips in the video, and thus, one of the dots The position corresponds to one of the N video clips.
  • the determined hitting position is the position of the video segment related to the highlight in the video, so that the terminal device can display a more exciting video segment, which can facilitate the user to quickly browse to the video Highlights.
  • the server when the server determines the highlight of each video segment based on a preset neural network model, it may first extract the first feature of each video segment based on the preset neural network model ,
  • the first feature includes some or all of the following: the temporal feature of the frame sequence, and the spatial feature of the frame sequence; that is, the first feature is an attribute of the video clip itself. After that, the server determines the highlight of each video segment according to the first feature of each video segment.
  • the server can determine the highlight degree from the attributes of the video clip itself, which can more objectively reflect the highlight degree of the video clip (which may be simply referred to as the highlight degree).
  • the server divides the video into multiple video clips in many ways, one of which is listed below:
  • the server first performs lens segmentation on the video, and after the lens segmentation, multiple sets of image frames can be obtained, wherein each group of image frames includes continuous multiple frame images; after that, the server divides the multiple groups The image frames are synthesized into one or more video clips of preset length.
  • the video segment is first segmented and then the video segment is synthesized.
  • the segmentation of the segment can ensure that the integrity of the content of the subsequently synthesized video segment is not affected after the segmentation, and the video segment can be better provided to the user.
  • the server first performs lens segmentation on the video to obtain multiple sets of image frames, where each set of image frames includes consecutive multiple frames of images; after that, the scenes or images similar to people displayed in the multiple sets of image frames
  • the frames are synthesized into one or more video clips, that is, the similarity of any two adjacent frames of images in the one video clip is within a preset range.
  • the server can extract and synthesize image frames of similar scenes or people into video clips, so that the content displayed by the video clips is highly relevant, and can provide users with a better viewing experience.
  • an embodiment of the present application provides a video playback method.
  • the method includes: first, a terminal device determines that a video needs to be played, or determines that there will be a demand for playing a video, and then sends a first request to a server, and then, And then receive a first response sent by the server, where the first request is used to request a video address that the terminal device needs to play, and the first response includes the video address and video hitting information, and the hitting information Including the hitting position of the video and the storage address of the video segment corresponding to the hitting position; after acquiring the video address and the hitting information, the terminal device may acquire the video according to the video address, and The corresponding video segment may be loaded at the hit location according to the storage address of the video segment corresponding to the hit location; after the video segment is loaded, the video and the video segment may be played.
  • the terminal device when the terminal device plays the video, it can also display the video clip corresponding to the hitting position of the video.
  • the video clip is more intuitive than the text information, which can make the user understand more conveniently, quickly and objectively.
  • the content of the video is more intuitive than the text information, which can make the user understand more conveniently, quickly and objectively.
  • the terminal device when the terminal device loads the corresponding video segment at the hitting location according to the storage address of the video segment corresponding to the hitting location, the terminal device may acquire the video by interacting with the server Clip and load it.
  • the terminal device can send a second request to the server, where the second request is used to request a video clip corresponding to the dot location, and the second request includes the dot corresponding to the dot location
  • the storage address of the video segment of the after that, the terminal device will receive a second response sent by the server, the second response includes the video segment corresponding to the dot location; after receiving the second response, The terminal device then loads the corresponding video segment at the dot location according to the second response.
  • the terminal device when the terminal device needs to obtain a video clip, it can interact through the server, so that the video clip can be obtained from the server more flexibly and conveniently.
  • At least one video segment corresponding to the hitting position closest to the current playback position is displayed.
  • the video content to be played can be conveniently displayed to the user, and the manner of displaying the video clip can make the user more easily understand the video content.
  • the terminal device when the terminal device plays the video, it may play the video segment corresponding to the hitting position after receiving the trigger operation of the hitting position.
  • the video content that the user wants to know can be displayed more conveniently and intuitively, and the user experience can be better improved.
  • an embodiment of the present application provides a method for generating video clips.
  • the method includes: first, a server may first divide a video into multiple video clips; after that, the server is based on a preset neural network model, Determine the highlight of each video segment; and the server may also select N video segments from the plurality of video segments according to the highlight of each video segment; and then perform video synthesis on the N video segments, For example, a short video can be synthesized, and the server can also provide the short video to the terminal device.
  • the exciting degree of the video segment determined by the preset neural network model is more objective, so that the server can extract and synthesize the more exciting part of the video, and can also play it through the terminal device Watching it to the user can enable the user to learn the video content faster and more intuitively.
  • an embodiment of the present application provides a method for generating video clips.
  • the method includes: first, a server may first divide a video into multiple video clips; after that, the server is based on a preset neural network model, Determine the highlight of each video segment; and the server may select N video segments from the plurality of video segments according to the highlight of each video segment; after that, the server may save the N video segments .
  • the server may also share the saved N video clips to the terminal device.
  • the nice degree of the video segment determined by the preset neural network model can more directly reflect the wonderful degree of the video segment
  • the server can save the more exciting video segment in the video
  • the terminal device displays to the user, which can improve the attractiveness of the video to the user.
  • an embodiment of the present application further provides a server.
  • the server has a function to implement the behavior in the method example of the first aspect described above.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the server includes a receiving unit and a sending unit, and may further include a processing unit, and these units can perform the corresponding functions in the method example of the first aspect described above. For details, see the detailed description in the method example I will not repeat them here.
  • an embodiment of the present application further provides a terminal device.
  • the terminal device has a function to implement the behavior in the method example of the second aspect described above.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the terminal device includes a sending unit, a receiving unit, a loading unit, and a playing unit, and these units can perform the corresponding functions in the method examples of the second aspect described above. For details, see the detailed description in the method examples , I won’t go into details here.
  • an embodiment of the present application further provides a server.
  • the server has a function to realize the behavior in the method example of the third aspect.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the server includes a division unit, a determination unit, a selection unit, and a synthesis unit, and may also include a storage unit, and these units may perform the corresponding functions in the method examples of the third aspect described above. The detailed description in the example will not be repeated here.
  • an embodiment of the present application further provides a server.
  • the server has a function to realize the behavior in the method example of the fourth aspect.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the server includes a division unit, a determination unit, a selection unit, and a storage unit, and these units can perform the corresponding functions in the method examples of the fourth aspect described above. For details, see the detailed description in the method examples. I will not repeat them here.
  • an embodiment of the present application further provides a server.
  • the structure of the server includes a processor, and may further include a transceiver or a memory.
  • the processor is configured to support the terminal device to perform the corresponding function in the method of the first aspect.
  • the memory is coupled to the processor, which stores necessary program instructions and data of the server.
  • the transceiver is used to communicate with other devices.
  • an embodiment of the present application further provides a terminal device.
  • the structure of the terminal device includes a processor and a transceiver, and may further include a memory, and the processor is configured to support the terminal device to perform the corresponding function in the method of the second aspect described above.
  • the memory is coupled to the processor, and stores necessary program instructions and data of the terminal device.
  • the transceiver is used for communicating with other devices, and may further include a display for receiving instructions from the processor to display video or video clips.
  • an embodiment of the present application further provides a server.
  • the structure of the server includes a processor, and may further include a transceiver or a memory.
  • the processor is configured to support the terminal device to perform the corresponding function in the method of the third aspect.
  • the memory is coupled to the processor, which stores necessary program instructions and data of the server.
  • the transceiver is used to communicate with other devices.
  • an embodiment of the present application further provides a server.
  • the structure of the server includes a processor, and may further include a transceiver or a memory.
  • the processor is configured to support the terminal device to perform the corresponding function in the method of the fourth aspect.
  • the memory is coupled to the processor, which stores necessary program instructions and data of the server.
  • the transceiver is used to communicate with other devices.
  • the present application also provides a computer-readable storage medium that stores instructions, which when executed on a computer, causes the computer to perform the methods described in the above aspects.
  • the present application also provides a computer program product containing instructions that, when run on a computer, cause the computer to perform the methods described in the above aspects.
  • the present application also provides a computer chip connected to a memory, the chip is used to read and execute a software program stored in the memory, and execute the methods described in the above aspects.
  • FIG. 1 is a schematic diagram showing text information at a dotted position in the prior art
  • FIG. 2 is a schematic structural diagram of a network system provided by the application.
  • FIG. 3 is a schematic diagram of a video playback method provided by this application.
  • FIG. 4 is a schematic diagram of a neural network structure provided by this application.
  • FIG. 5 is a schematic diagram of a method for generating video clips provided by this application.
  • FIG. 6 is a schematic diagram of another method for generating video clips provided by this application.
  • FIG. 7 is a schematic structural diagram of a server provided by this application.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by this application.
  • FIG. 9 is a schematic structural diagram of a server provided by this application.
  • FIG. 10 is a schematic structural diagram of a server provided by this application.
  • FIG. 11 is a schematic structural diagram of a terminal device provided by this application.
  • FIG. 12 is a schematic structural diagram of a server provided by this application.
  • the present application provides a video playback method and device to solve the problem that the text information displayed in the prior art cannot well summarize the video content and affect the user experience.
  • Video dotting refers to the process of describing key frames in a video to summarize the video content.
  • the terminal device is usually in the progress of the video
  • One or more dot positions are set on the bar.
  • the dot positions are marked with the text information of the video content at that position.
  • the user can move the cursor to the dot position by sliding the mouse.
  • the text of the video content will be displayed at the dot position Information, but usually the text information of the video content is relatively concise, and it cannot intuitively reflect the wonderfulness of the video content, which is easy to cause users to miss some more exciting pictures, and cannot effectively improve the user experience.
  • FIG. 2 it is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • the network architecture includes a terminal device and a server.
  • the server is a remote server deployed in the cloud, or a server provided in the network that can provide services.
  • the server has a video processing function and a data calculation function.
  • the server can perform video segmentation and video clips. Functions such as determination;
  • the server may be an ultra-multi-core server, a computer with a graphics processing (GPU) cluster, a large-scale distributed computer, a clustered computer with hardware resource pooling, etc., in the embodiment of the present application
  • the server may generate management information, and after the terminal device requests a video address from the server, send the video address and management information to the terminal device.
  • the server can also divide the video, and then determine the degree of excellence of one or more video clips after the division, and can also select more exciting multiple video clips from the divided video clips for video synthesis.
  • Wonderful video clip can also divide the video, and then determine the degree of excellence of one or more video clips after the division, and can also select more exciting multiple video clips from the divided video clips for video synthesis.
  • Wonderful video clip can also divide the video, and then determine the degree of excellence of one or more video clips after the division, and can also select more exciting multiple video clips from the divided video clips for video synthesis.
  • the server may store the video data required by the terminal device, including the source data of the video, the divided video clips, the highlight of each video clip, and the video after the video synthesis (corresponding to the The first video), and dynamic pictures.
  • the terminal device may initiate a request to the server (corresponding to the first request and the second request in the embodiment of the present application), so as to obtain relevant data (such as video address, management information, storage address of video clips, etc.) from the server Video clips, videos after video synthesis, dynamic pictures, etc.).
  • relevant data such as video address, management information, storage address of video clips, etc.
  • the terminal device After obtaining relevant data, the terminal device performs operations such as loading or displaying. For example, after obtaining the video address, the terminal device will obtain the video according to the video address; when the terminal device obtains the video After the hitting information, a video clip may be loaded at the hitting position of the video according to the hitting information, and after the video clip is loaded, the terminal device may also play the video clip; when the terminal device obtains The video or dynamic picture after video synthesis can be displayed to the user.
  • the terminal equipment in this application which can also be called user equipment (UE), can be deployed on land, including indoor or outdoor, handheld or in-vehicle; it can also be deployed on water (such as ships, etc.); it can also be Deploy in the air (such as airplanes, balloons and satellites, etc.).
  • the terminal device may be a mobile phone, a tablet computer, a computer with wireless transceiver function, a virtual reality (VR) device, an augmented reality (AR) device, and industrial control (industrial control) ), wireless devices in self-driving, self-driving, wireless devices in remote medical, wireless devices in smart grid, wireless devices in transportation safety , Wireless devices in smart cities (smart cities), wireless devices in smart homes (smart homes), etc.
  • the server may provide the terminal device with the video address to be played and the hitting information of the video, and the hitting information includes the hitting position of the video and the video segment corresponding to the hitting position Storage address, so that the terminal device can load the corresponding video clip at the hitting position of the video based on the hitting information and play it.
  • the The video clips are displayed at the dotted position of the video, which can more intuitively reflect the video content, and can effectively improve the user experience.
  • Step 301 The terminal device sends a first request to the server, where the first request is used to request a video address that the terminal device needs to play.
  • the terminal device may send a first request to the server to request the video address that the terminal device needs to play, in which the terminal device needs to play.
  • the first request may carry the identification information of the video and a cell indicating the acquisition of the video address.
  • the above method is only an example. The methods of the video address that the terminal device needs to play are all applicable to the embodiments of the present application.
  • Step 302 After the server receives the first request of the terminal device, the server sends a first response to the terminal device, the first response includes a video address and video hitting information, and the hitting information includes all The hitting position of the video and the storage address of the video clip corresponding to the hitting position.
  • Step 303 The terminal device obtains the video according to the video address, and loads the corresponding video segment at the hitting location according to the storage address of the video segment corresponding to the hitting location.
  • Step 304 The terminal device plays the video and the video clip.
  • the server After the server receives the first request, in order to send the video address and the hit information of the video to the terminal device, the server needs to parse the video first to generate the hit information, that is, It is necessary to determine the hitting position of the video and the video segment corresponding to the hitting position.
  • Any generated hit information includes the hit position of the video and the storage address of the video segment corresponding to each hit position. Applicable to the embodiments of the present application.
  • the embodiment of the present application provides a method for generating management information. The following describes this method:
  • the server may first divide the video into multiple video segments.
  • the manner in which the server divides the video is not limited.
  • the server may divide the video into multiple video segments according to a preset length; the server may also display the video according to Content, the video is divided into multiple video segments, and the displayed content is different for each video segment; for example, the server may divide the displayed content in the video including a segment of a specific scene or person It comes out and is synthesized into one video segment; if the video includes multiple different scenes or characters, it may be divided into multiple video segments.
  • Manner 1 Divide the video to obtain multiple video clips of preset length.
  • the server first performs lens segmentation on the video to obtain multiple sets of image frames, where each set of image frames includes continuous multiple frames of images.
  • Sudden shot refers to a group of consecutive related image frames in the video directly switching to the next group of consecutive related image frames without transition in the middle;
  • Gradient shot refers to a group of image frames in the video passing through color difference or space The effect gradually transitions to the next set of image frames.
  • the video is segmented according to the shot type of the video; in the process of shot segmentation, for abrupt shots, a group of image frames in the abrupt shots is determined to switch to the next
  • the node of a group of image frames is used as a cutting point to divide, the previous image frame of the cutting point is taken as the end frame of the group of image frames, and the next image frame of the cutting point is taken as the starting frame of the next group of image frames ;
  • the previous image frame of the gradation interval is taken as the end frame of the group of image frames, and the latter of the gradation interval
  • the image frame is the starting frame of the next group of image frames.
  • the server When the server performs lens segmentation, it can use differential histogram feature to segment video shots, such as fast shot segmentation (FAST) algorithm to determine the cutting point to achieve lens segmentation, or other methods, such as Three-dimensional fully convolutional neural networks (3dimension fully convolutional networks, 3D-FCN) for lens segmentation.
  • FAST fast shot segmentation
  • 3D-FCN Three-dimensional fully convolutional neural networks
  • the server determines multiple cutting points in the video, thereby obtaining multiple sets of image frames, and each set of image frames includes continuous multiple frames of images.
  • the server can obtain the frame number of the start frame (frame number of the start image frame) and the frame number of the end frame (frame number of the end image frame) of each group of image frames through lens segmentation ), you can also determine the start timestamp and end timestamp of each group of image frames.
  • the server may also remove the transition image in each group of image frames.
  • the transition image includes part or all of the following image frames: all black or all white image frames , Image frames and fusion frames where the displayed scene or people are more blurred.
  • the fusion frame refers to an image frame formed by the fusion of two different image frames in a video.
  • the previous image frame gradually disappears or becomes darker
  • the latter image frame gradually becomes obvious or brighter
  • Image frames with overlapping pictures this kind of image frames are fused frames.
  • the server may synthesize the multiple sets of image frames into one or more video clips of a preset length.
  • multiple groups of images Frames are divided into three types; type one, a group of image frames with a time length greater than a preset length, that is to say, the group of image frames includes a larger number of image frames, and the group of image frames can display more A scene, the group of image frames constitutes a long shot; type two, a group of image frames whose time length is less than the preset length and the number of image frames is less than the set value, that is to say, the group of image frames includes The number of image frames is small, the group of image frames may not be enough to show a complete scene, the group of image frames forms a short shot; type three, the length of time is equal to the preset length, or the length of time and the preset length The difference is small and belongs to the preset range; in this case, the time length of the group of image frames can
  • the preset length of the video segment indicates the duration of playing the video segment; and the preset length may be set according to specific scenarios, and the embodiment of the present application does not limit the setting method.
  • the server can synthesize one or more video clips of a preset length in a corresponding manner; the following separately describes a method for synthesizing video clips of any group of image frames of different types :
  • the server may The set of image frames is divided into one or more video clips of preset length.
  • the start frame, end frame, start time and end time of any video segment f i+m after dividing the i-th image frame can be determined by the following formula:
  • L represents the preset length.
  • the length of the video clip may not strictly reach L, and may be larger or smaller than L, so set ⁇ to indicate the length gain, the length of the video clip Then it is between L- ⁇ and L+ ⁇ ;
  • f i represents the i-th video segment
  • f (i+m)_start_frame_number represents the frame number of the start frame of the i+mth video segment
  • sk_start_frame_number represents the frame number of the kth group of image frames in the video
  • f (i+m)_end_frame_number represents the frame number of the end frame of the i+mth video segment
  • sk_end_time represents the end time of the frame number of the kth group of image frames in the video.
  • the server may Continuous groups of image frames are combined into one or more video clips of preset length.
  • a plurality of sets of the image frame of a video segment of any starting frame f i, and end frame, the start time and end time is determined by the following formula:
  • L- ⁇ s (k+n)_end_time -s k_start_time ⁇ L + ⁇ that is to say, the total length of consecutive groups of image frames is within the preset range, and the difference from the preset length Small, the time length of any group of image frames in the consecutive groups of image frames is less than the preset length.
  • sk indicates the kth group of image frames in the video
  • sk+n indicates the k+n group of image frames in the video.
  • the group of images Since the number of image frames included in any group of image frames of type three is between the number of image frames of any group of type one and the number of image frames included in any group of image frames of type two, the group of images The length of the frame is longer and the lens is shorter. If the time of the group of image frames is equal to the preset length, or it can be regarded as equal to the preset length within an error range, the server can set the group of images Frames are synthesized into a video clip of preset length.
  • start frame, end frame, start time, and end time of any video segment f i synthesized by multiple sets of image frames can be determined by the following formula:
  • L- ⁇ s k_end_time -s k_start_time ⁇ L + ⁇ , the time length of the group of image frames and the preset length can be regarded as equal, for the parameter description in the formula, please refer to the previous description, here is no longer Repeat.
  • Manner 2 Divide the video to obtain one or more video clips, wherein the similarity of any two adjacent frames in any one of the video clips is within a preset range, that is to say, one of the video clips Shows a type of scene or similar characters.
  • the server may also segment the video to obtain multiple sets of image frames, where each set of image frames includes consecutive multiple frames of images.
  • each set of image frames includes consecutive multiple frames of images.
  • lens segmentation please refer to the description about lens segmentation in method one, which will not be repeated here.
  • the image frames with similar scenes displayed in each group of image frames can be synthesized into one video clip, and if there are different scenes, the corresponding scenes are synthesized separately for different scenes.
  • a video segment of the video the server may synthesize the multiple sets of image frames into multiple video segments.
  • the server When synthesizing video clips, the server needs to determine whether the scenes displayed in each image frame among multiple sets of image frames are similar or the same. There are many ways to determine, and the embodiments of the present application are not limited; for example, the server may first Extract the visual characteristics of key frames of shots (such as long shots, short shots, and single shots), use preset similarity judgment criteria, cluster shots with similar time and semantic content into a scene, and then synthesize them into corresponding videos Fragments; the server can use 3D-FCN to extract the visual characteristics of the key frames of the lens, or the video frame color histogram method to extract the visual characteristics of the key frames of the lens, etc.; the server can use a tree-shaped support vector machine (support vector machine (SVM) for lens clustering.
  • SVM support vector machine
  • the server may directly use each video clip at the starting position of the video as the hitting position of the video, in which case each hitting position corresponds to a video segment;
  • the server may also remove some video clips from the plurality of video clips, and use the starting position of each video clip in the remaining video clips as the hitting position of the video;
  • the embodiments of the application are not limited, and may be randomly removed, or video clips with more transition shots in the plurality of video clips and longer transition shot time may be removed, or the video clips may be removed according to actual application scenarios.
  • the method of directly determining the hitting position of the video is just an example.
  • the server may also use other methods to determine the hitting position of the video more accurately.
  • the server may first Evaluate the wonderful degree of each video clip, that is, first determine the wonderful degree of each video clip, then select the video clip according to the wonderful degree of each video clip, and then determine the location of the video.
  • the embodiments of the present application do not limit the number of dot positions included in the dot information, and may be one or more.
  • the server measures the wonderful degree of a video clip, it can be based on the number of views of the video clip. The more times the video clip is, the more brilliant the video clip is. High, that is, the more exciting.
  • the server can retrieve the number of times each video clip is played, and use the number of times as the highlight of the video, or it can convert the number of times of play into a value in tenths or percents through a preset function as The wonderfulness of the video segment, the larger the value is, the more exciting the video segment is; for example, the server may also according to the number of comments (such as barrage) posted by the user in the video segment, the greater the number of comments posted by the user, the video The higher the excitement of the clip, that is, the higher the excitement, in specific implementation, the server can call the number of comments (such as barrage) posted by each user of the video clip, and use the number of comments as the highlight of the video, You can also use a preset function to convert the number of comments to a value in tenths or cents.
  • the server can call the number of comments (such as barrage) posted by each user of the video clip, and use the number of comments as the highlight of the video, You can also use a preset function to convert the number
  • the server may determine the wonderful degree of the video segment according to the frequency or tone of the sound of each video segment under the same playing sound.
  • the server may also determine the wonderfulness of each video clip through some characteristics of each video clip (such as the brightness, color, texture, and other image characteristics of each image in the video clip), which requires Analyze each video clip.
  • the server may determine the highlight of each video segment based on a preset neural network model.
  • the server may extract the first feature of each video segment based on a preset neural network model, and the first feature includes some or all of the following: the temporal feature of the frame sequence and the spatial feature of the frame sequence.
  • each video segment is composed of a sequence of image frames
  • the spatial feature of each video segment corresponds to the spatial feature of the frame sequence
  • the time feature of each video clip corresponds to the time feature of the frame sequence, which is the appearance feature of consecutive image frames extracted by the preset neural network model, and characterizes the color, brightness, contrast, and clarity of consecutive image frames
  • the correlation between information such as degree, texture, and the intensity of the motion of objects in multi-frame images.
  • the preset neural network model is a model that has been pre-trained with sample data and can output the first feature of a video clip.
  • the sample data is a video clip that has been marked with a brilliant degree. Through training, the preset The neural network model can extract the first feature of the video clip.
  • the preset neural network model may extract only the spatial features of the video segment, or only the temporal features of the video segment, or may simultaneously extract the spatial and temporal features of the video segment.
  • the embodiment of the present application does not limit the number and type of network layers included in the preset neural network model. Any neural network model that can extract the spatial features of video clips is applicable to the embodiment of the present application. A neural network model and the process of extracting the first feature of the neural network model.
  • the neural network model includes an input layer and N convolutional layers (in order to distinguish each convolutional layer, the direction from input to output Named as the first convolutional layer, the second convolutional layer ... Nth convolutional layer), fully connected layer, output layer.
  • the input layer groups the multiple image frames in the video segment, each group includes T image frames, Then input into N convolutional layers, and each convolutional layer performs convolution operations (such as 3D convolution operations) and pooling operations (such as max-pooling pooling) on the image frames in each group, each time convolution Product operation, the image frames in each group are reduced by two frames until the Nth convolutional layer, after performing the convolution operation and the pooling operation, 1 image frame is obtained; then the obtained 1 image frame is input to all
  • the fully connected layer performs processing, the fully connected layer inputs the processed data to the output layer, and the output layer outputs the first feature of the video clip (indicated by hi in FIG. 4).
  • the server determines the highlight of each video segment according to the first feature of each video segment.
  • the first feature extracted by the preset neural network model is a vector or data with a more complex form, and it cannot intuitively reflect the degree of accuracy of the video clip.
  • the server can extract the first feature of the extracted video clip Transform into a more intuitive video clip highlight, for example, a preset function is used to convert the first feature of the video clip into a highlight.
  • a preset function is used to convert the first feature of the video clip into a highlight.
  • the server may convert the first feature through the softmax function:
  • H i represents the brilliant degree of the i-th video clip, H i ⁇ (0,1), the closer H i is to 1, the higher the brilliant degree of the video clip; w i represents the first feature of the i-th video clip, N Represents the total number of video clips.
  • the preset neural network model may also have the function of extracting the first feature of the video clip and the function of converting the first feature of the video clip into a highlight, so that the preset neural network model can be directly output How exciting the video clip is.
  • the server may select N video segments according to the highlight of each video segment, for example, the server may select the top N after sorting the highlights from large to small
  • a preset range of precision can also be set, and N video clips whose brilliant degrees belong to the preset range of precision can be selected.
  • the server may determine N dot positions of the video according to the positions of the N video clips in the video, wherein one of the dot positions and the N videos One of the video clips in the clip corresponds.
  • the server may store the hitting position information and the corresponding video segment locally or in other servers.
  • the information on the dotted position is information that can identify the dotted position, including part or all of the following information:
  • the identification of the dotted position and the dotted position are on the progress bar of the video.
  • an identifier can be set for each dotted position, such as numbering, or letters can be used to distinguish, that is, the dotted position
  • the logo can be a number, a letter, or a specific point in time. Any method that can identify different dot positions is applicable to the embodiments of the present application.
  • the server may determine a hit position of the video according to the position of the video segment in the video, and the server may use the starting position of the video segment in the video as the hit Position, in this case, the position of the video clip on the progress bar in the video is a dotted position of the video, and the dotted position has a corresponding relationship with the video segment.
  • the video segment corresponding to the dotted position saved by the server may be with or without audio, such as a dynamic image.
  • the server may send the first response carrying the dot information to the terminal device.
  • the first response further includes the video address, and the video address is a storage address of the video.
  • the server in order to make the transmitted information occupy less resources, the server only carries the video address and the video RBI information in the first response.
  • the terminal device may When a video or video clip needs to be displayed, the video or video clip is obtained according to the video address or the video's hit information.
  • the terminal device may also send a request for the video to the server, and the server may feed back the video The hit position of the video and the response message of the video segment corresponding to the hit position of the video. After receiving the response message, the terminal device can flexibly select the time and manner of displaying the video and the video segment.
  • the terminal device may send a request to obtain the video to the server or the device that stores the video.
  • the request may include Carry the video address.
  • the terminal device may preload the video segment corresponding to each hitting position, or may preload the video segment corresponding to some hitting positions.
  • the video segment corresponding to the hitting position of the previous) when the terminal device plays the video to the position behind the progress bar, then loads the remaining video segment corresponding to the hitting position.
  • the terminal device may also play the video while loading the corresponding video segment at the dot position, for example, the terminal device may load the current distance when playing the video A video segment corresponding to at least one or more of the most recent play positions; the terminal device may also play each video in the order of the play position on the progress bar when playing the video Load the corresponding video clip at the location.
  • the terminal device loading the corresponding video segment at the hitting location according to the storage address of the video segment corresponding to the hitting location may specifically include: the terminal device may first send a second request to the server, the second The request is used to request a video segment corresponding to the hitting location, the second request includes a storage address of the video segment corresponding to the hitting location, and the terminal device then receives the video segment returned by the server.
  • the second request may include an identifier of the video segment corresponding to the hitting position, so that the server can obtain the corresponding video segment according to the identifier and return it to the terminal device.
  • the second request may be used to request the video segment corresponding to the partial hitting position of the video, and the second request includes A storage address of a video segment corresponding to a partial hitting position of the video; when the terminal device needs to load a video segment corresponding to all hitting positions of the video, the second request is used to request that all hitting positions of the video correspond to In the second video segment, the second request includes the storage addresses of the video segments corresponding to all the hit positions of the video.
  • the server After receiving the second request, the server obtains the video segment corresponding to the hitting position according to the second request, that is, according to the storage address of the video segment corresponding to the hitting position, and then The video clip corresponding to the dotted position is carried in the second response and sent to the terminal device, and the terminal device receives the second response sent by the server to play the corresponding video clip.
  • the terminal device plays the video and the video segment, it can be specifically divided into the following two cases:
  • Case 1 When playing the video, the terminal device actively plays at least one video segment corresponding to the hitting position closest to the current playing position.
  • the terminal device may display a small window at at least one of the hitting positions closest to the current playback position, and play the corresponding video clips respectively; it may also perform split-screen display on the video display interface, that is, the video
  • the display interface is divided into two parts, one part plays the current video, and the other part plays the video segment corresponding to at least one of the hitting positions closest to the current playback position.
  • Case 2 After receiving the trigger operation of the hitting position, the terminal device plays a video segment corresponding to the hitting position.
  • the triggering operation of the hitting position may be detecting that the cursor stays at the hitting position, or the user may click or double-clicking at the hitting position through the mouse; or it may be detecting that the user is at the hitting position Touch screen.
  • a small window may be displayed at the hitting position to play the corresponding video clip.
  • the terminal device plays the video and the video segment
  • the video and the video segment may be played simultaneously.
  • the terminal device may play the video in a large window.
  • Video play the video clip in a small window, in order not to affect the user experience, when playing the video clip, only the image of the video clip can be played, and no sound is played; the terminal device can also pause the playback Video, but only the video clip, as in case two, after receiving the trigger operation of the dotted position, the terminal device may pause playing the video and display at the dotted position A small window to play the video clip (display image and sound) corresponding to the dotted position.
  • an embodiment of the present application further provides a method for generating video clips.
  • the method includes:
  • Step 501 The server divides the video into multiple video segments.
  • Step 502 The server determines the highlight of each video segment based on the preset neural network model.
  • Step 503 The server selects N video clips from the plurality of video clips according to the brightness of each video clip.
  • N video clips there are many ways to select it. For example, you can select the first N video clips after sorting the highlights from large to small, or you can set a preset precision range. N video clips in the preset precision range.
  • Step 504 The server synthesizes the N video segments.
  • the server may synthesize the N video clips into one video (for convenience of description, the first video represents the video after video synthesis).
  • the video after the video synthesis may be stored, for example, the first video may be stored locally or may be stored on another server.
  • the terminal device may send a request for the address of the first video to the server, and the server may send the address of the first video to the terminal device.
  • the terminal device determines that the first video is needed, if the terminal device determines that the first video needs to be displayed, it sends a request carrying the address of the first video to the server. After receiving the request, the first video is sent to the terminal device.
  • the server may also directly send the first video to the terminal device; between the server and the terminal device It is also possible to omit the interaction of obtaining the address of the first video, the terminal device directly sends a request for requesting the first video to the server, and the server directly sends the first video to the terminal device .
  • an embodiment of the present application further provides a method for generating video clips.
  • the method includes:
  • Step 601 The server divides the video into multiple video segments.
  • Step 602 The server determines the highlight of each video segment based on a preset neural network model.
  • Step 603 The server selects N video clips from the plurality of video clips according to the brightness of each video clip.
  • the server splits the video determines the highlight of each video segment, and the steps for selecting the video segment, reference may be made to the relevant description in the embodiment shown in FIG. 5, and details are not described here.
  • Step 604 The server saves the N video clips.
  • the N video clips saved by the server may be with or without audio, such as a dynamic image.
  • the server may send the N video clips to the terminal device, and the server may directly send it, or after receiving the request from the terminal device,
  • N dynamic pictures as an example for description:
  • the terminal device may send a request for the address of the dynamic map to the server, and the server may send the address of the dynamic map to the terminal device.
  • the terminal device determines that the dynamic image is needed, if the terminal device determines that the dynamic image needs to be displayed, it then sends a request to the server carrying the address of the dynamic image, and the server receives the After the request, the dynamic image is sent to the terminal device.
  • the server may also directly send the dynamic map to the terminal device; interaction between obtaining the address of the dynamic map may also be omitted between the server and the terminal device,
  • the terminal device directly sends a request for requesting the dynamic image to the server, and the server directly sends the dynamic image to the terminal device.
  • an embodiment of the present invention provides a server, which is specifically used to implement the method executed by the server in the method embodiment shown in FIG. 3, and the structure of the server is shown in FIG. 7, Including receiving unit 701 and sending unit 702;
  • the receiving unit 701 is configured to receive a first request from a terminal device, and the first request is used to request a video address to be played by the terminal device.
  • the sending unit 702 is configured to send a first response to the terminal device, where the first response includes the video address and the hitting information of the video, and the hitting information includes the hitting position of the video and the hitting position The storage address of the corresponding video clip.
  • the server may also send the video clip to the terminal device.
  • the server 700 further includes a processing unit 703, and the receiving unit 701 will receive a second request sent by the terminal device, where the second request is used to request a video clip corresponding to the dot location, the second The second request includes the storage address of the video segment corresponding to the hitting position; after the receiving unit 701 receives the second request, the processing unit 703 obtains the storage address according to the storage address of the video segment corresponding to the hitting position The video segment corresponding to the dot location, and then the sending unit 702 may send a second response to the terminal device, where the second response includes the video segment corresponding to the dot location.
  • the processing unit 703 may be used to determine that the hitting position corresponds to the hitting position before the sending unit 702 sends the first response to the terminal device Video segment; specifically, the processing unit 703 first divides the video into multiple video segments; then, based on a preset neural network model, determines the highlight of each video segment; and according to each video Selecting N video clips for the highlights of the clips; after selecting the N video clips, the processing unit 703 may determine the N hitting positions of the video according to the positions of the N video clips in the video, Wherein, one of the dot positions corresponds to one of the N video clips.
  • the processing unit 703 may extract based on the preset neural network model
  • the first feature of each video segment the first feature includes some or all of the following: the temporal feature of the frame sequence, and the spatial feature of the frame sequence; the processing unit 703 then determines The first feature determines the highlight of each video segment.
  • processing unit 703 there are many ways for the processing unit 703 to divide the video into multiple video clips, two of which are listed below:
  • Method 1 The length of the video segment obtained by the division is uniform, which is a preset length.
  • the processing unit 703 first performs shot segmentation on the video according to the shot type of the video to obtain multiple sets of image frames, where each set of image frames includes continuous multiple frames of images; the multiple sets of image frames are then synthesized One or more video clips of preset length.
  • Method 2 The video segment obtained by segmentation shows a specific scene or a specific person.
  • the processing unit 703 first obtains multiple sets of image frames according to the video segmentation of the video, wherein each set of image frames includes continuous multiple frames of images; after that, the multiple sets of image frames are combined into one or Multiple video clips, wherein the similarity of any two adjacent frames of images in the one video clip is within a preset range.
  • an embodiment of the present invention provides a terminal device, which is specifically used to implement the method executed by the terminal device in the method embodiment shown in FIG. 3, and the structure of the terminal device is shown in FIG. 8. As shown, it includes a sending unit 801, a receiving unit 802, a loading unit 803, and a playing unit 804, where:
  • the sending unit 801 is configured to send a first request to the server, where the first request is used to request a video address to be played by the terminal device.
  • the receiving unit 802 is configured to receive a first response sent by the server, where the first response includes the video address and the hitting information of the video, and the hitting information includes the hitting position of the video and the hitting information The storage address of the video clip corresponding to the location.
  • the loading unit 803 is configured to obtain the video according to the video address, and load the corresponding video segment at the hitting location according to the storage address of the video segment corresponding to the hitting location.
  • the playing unit 804 is configured to play the video and the video clip.
  • the terminal device may interact with the server, specifically, the sending unit 801 First send a second request to the server, the second request is used to request the video segment corresponding to the hitting position, the second request includes the storage address of the video segment corresponding to the hitting position; after that, the receiving Unit 802 then receives a second response sent by the server, where the second response includes the video segment corresponding to the hit location; after receiving the second response, the loading unit 803 receives the second response The corresponding video segment is loaded on the dotted position.
  • the sending unit 801 First send a second request to the server, the second request is used to request the video segment corresponding to the hitting position, the second request includes the storage address of the video segment corresponding to the hitting position; after that, the receiving Unit 802 then receives a second response sent by the server, where the second response includes the video segment corresponding to the hit location; after receiving the second response, the loading unit 803 receives the second response The corresponding video segment is loaded on the dotted position.
  • the playback unit 804 may display at least one video clip corresponding to the hitting position closest to the current playback position during playback of the video.
  • the playback unit 804 may play the video segment corresponding to the hit location after receiving the trigger operation of the hit location.
  • an embodiment of the present invention provides a server, which is specifically used to implement the method executed by the server in the method embodiment shown in FIG. 5, and the structure of the server is shown in FIG. 9, It includes a division unit 901, a determination unit 902, a selection unit 903, and a synthesis unit 904, where:
  • the dividing unit 901 is used to divide the video into multiple video segments.
  • the determining unit 902 is configured to determine the highlight of each video segment based on a preset neural network model.
  • the selecting unit 903 is configured to select N video clips from the plurality of video clips according to the brightness of each video clip.
  • the synthesis unit 904 is configured to perform video synthesis on the N video segments.
  • the server may further include a storage unit, where the storage unit is used to store the video after video synthesis.
  • an embodiment of the present invention provides a server, which is specifically used to implement the method executed by the server in the method embodiment shown in FIG. 6, and the structure of the server is shown in FIG. 10. It includes a division unit 1001, a determination unit 1002, a selection unit 1003, and a storage unit 1004, where:
  • the dividing unit 1001 is used to divide the video into multiple video segments.
  • the determining unit 1002 is configured to determine the highlight of each video segment based on a preset neural network model.
  • the determining unit 1003 is configured to select N video clips from the plurality of video clips according to the brightness of each video clip.
  • the storage unit 1004 is configured to save the N video clips.
  • the division of the units in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • the functional units in the embodiments of the present application may be integrated in one process. In the device, it can also exist alone physically, or two or more units can be integrated into one module.
  • the above integrated unit can be implemented in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a terminal device (which may be a personal computer, mobile phone, or network device, etc.) or processor to execute all or part of the steps of the method in various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes .
  • both the server and the terminal device may be presented in the form of dividing each functional module in an integrated manner.
  • the "module” herein may refer to a specific ASIC, circuit, processor and memory that execute one or more software or firmware programs, integrated logic circuits, and/or other devices that can provide the above functions.
  • the terminal device may adopt the form shown in FIG. 8.
  • the terminal device 1100 shown in FIG. 11 includes at least one processor 1101, and optionally, may also include a transceiver 1102 and a memory 1103.
  • the terminal device 1100 may further include a display 1104.
  • the memory 1103 may be a volatile memory, such as a random access memory; the memory may also be a non-volatile memory, such as a read-only memory, a flash memory, a hard disk drive (HDD), or a solid-state drive (solid-state drive, SSD), or the memory 1103 is any other medium that can be used to carry or store a desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory 1103 may be a combination of the aforementioned memories.
  • the embodiment of the present application does not limit the specific connection medium between the processor 1101 and the memory 1103.
  • the memory 1103 and the processor 1101 are connected by a bus 1105 in the figure.
  • the bus 1105 is indicated by a thick line in the figure.
  • the connection between other components is only for illustrative purposes, and is not cited as Limited.
  • the bus 1105 can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the processor 1101 may have a data receiving and sending function and be able to communicate with other devices.
  • the processor 1101 may send a first request or a second request to a server, or may receive a first response from the server Or a second response, in the device shown in FIG. 11, an independent data transceiver module may also be provided, such as a transceiver 1102, used to send and receive data; when the processor 1101 communicates with other devices, data may be transmitted through the transceiver 1102
  • the processor 1101 may send the first request or the second request to the server through the transceiver 1102, or may receive the first response or the second response from the server through the transceiver 1102.
  • the processor 1101 in FIG. 11 may call the computer stored in the memory 1103 to execute instructions, so that the terminal device can execute the method described in any of the foregoing method embodiments The method performed by the terminal device.
  • the memory 1103 stores computer-executed instructions for implementing the functions of the sending unit, the receiving unit, the loading unit, and the playing unit in FIG. 8, and the functions of the sending unit, the receiving unit, the loading unit, and the display unit in FIG. 8/
  • the implementation process can be realized by the processor 1101 in FIG. 11 calling the computer execution instructions stored in the memory 1103.
  • the memory 1103 stores computer-executed instructions for realizing the functions of the loading unit and the playback unit in FIG. 8.
  • the functions/implementation processes of the loading unit and the playback unit in FIG. 8 can be invoked by the processor 1101 in FIG. 11
  • the computer stored in 1103 executes instructions to implement, and the function/implementation process of the sending unit and the receiving unit in FIG. 8 can be implemented by the transceiver 1102 in FIG. 11.
  • the memory 1103 can also be used to store video data or RBI information required by the sending unit, the receiving unit, the loading unit, and the playing unit in FIG. 8, if the video address, the Video clips, the video or the information about the video.
  • the processor 1101 when the processor 1101 executes the function of the playback unit, if it involves an operation of playing a video or video clip, the processor 1101 may display the played video or video clip through the display 1104 in the terminal device.
  • the processor 1101 when the processor 1101 is performing the function of the display unit, it may also display the video or video clip through the display in other devices, such as sending a playback instruction to other devices to instruct the video or video clip.
  • the server may take the form shown in FIG. 12.
  • the server 1200 shown in FIG. 12 includes at least one processor 1201, and optionally, may also include a memory 1202 and a transceiver 1203.
  • the memory 1202 may be a volatile memory, such as a random access memory; the memory may also be a non-volatile memory, such as a read-only memory, a flash memory, a hard disk or a solid-state hard disk, or the memory 1202 may be used to carry or store instructions Or any other medium that can be accessed by the computer in the form of a desired program code in the form of a data structure, but is not limited thereto.
  • the memory 1202 may be a combination of the aforementioned memories.
  • the specific connection medium between the processor 1201 and the memory 1202 is not limited.
  • the memory 1202 and the processor 1201 are connected by a bus 1204 in the figure.
  • the bus 1204 is indicated by a thick line in the figure.
  • the connection between other components is only a schematic illustration, not cited. Limited.
  • the bus 1204 can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 12, but it does not mean that there is only one bus or one type of bus.
  • the processor 1201 may have a data transmission and reception function and can communicate with other devices.
  • an independent data transmission and reception module may also be provided, such as a transceiver 1203, for transmitting and receiving data; the processor 1201 is communicating with other devices When communicating, data can be transmitted through the transceiver 1203.
  • the processor 1201 in FIG. 12 may call the computer stored in the memory 1202 to execute instructions, so that the server can execute the server in any of the above method embodiments. Methods.
  • the memory 1202 stores computer-executed instructions for implementing the functions of the sending unit, the receiving unit, and the processing unit in FIG. 7.
  • the functions/implementation processes of the sending unit, the receiving unit, and the processing unit 703 in FIG. 7 can all be passed
  • the processor 1201 in FIG. 12 calls the computer execution instructions stored in the memory 1202 to implement.
  • the memory 1202 stores computer-executed instructions for implementing the functions of the processing unit 703 in FIG. 7.
  • the function/implementation process of the processing unit 703 in FIG. 7 can call the computer stored in the memory 1202 through the processor 1201 in FIG. 12
  • the instruction is implemented.
  • the function/implementation process of the sending unit and the receiving unit in FIG. 7 can be implemented by the transceiver 1203 in FIG. 12.
  • the memory 1202 can also be used to store video data or RBI information required by the sending unit, receiving unit, and processing unit in FIG. 7, such as the video address, the video clip, Information about the video or the video.
  • the memory 1202 stores computer-executed instructions for realizing the functions of the dividing unit, the determining unit, the selecting unit, and the combining unit in FIG. 9, and the functions of the dividing unit, the determining unit, the selecting unit, and the combining unit in FIG. 9/
  • the implementation process can be implemented by the processor 1201 in FIG. 12 calling the computer execution instructions stored in the memory 1202.
  • the processor 1201 can also send the first video to other through the transceiver 1203 device.
  • the memory 1202 can also be used to store video data required by the segmentation unit, determination unit, selection unit, and synthesis unit in FIG. 9, such as the video clip, the video, and the Describe the first video.
  • the memory 1202 stores computer-executable instructions for realizing the functions of the dividing unit, the determining unit, the selecting unit, and the storing unit in FIG. 10, and the functions of the dividing unit, the determining unit, the selecting unit, and the storing unit in FIG. 10/
  • the implementation process can be realized by the processor 1201 in FIG. 12 calling the computer execution instructions stored in the memory 1202.
  • the processor 1201 may also send the saved video clip to other devices through the transceiver 1203.
  • the memory 1202 can also be used to store video data required by the segmentation unit, determination unit, selection unit, and storage unit in FIG. 10, such as the video clip, the video, or all Described dynamic graph.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

一种视频播放方法及装置,用以解决现有技术中显示的文字信息并不能很好的对视频内容进行概括,影响用户体验的问题。在本申请中,终端设备向服务器发送用于请求终端设备需要播放的视频地址的第一请求,并接收服务器发送的第一响应,第一响应中包括视频地址和视频的打点信息,打点信息包括视频的打点位置和打点位置对应的视频片段的存储地址;终端设备根据视频地址获取视频,根据打点位置对应的视频片段的存储地址在打点位置上加载对应的视频片段;在加载了视频片段后,终端设备在播放视频时,也可以显示视频的打点位置对应的视频片段,视频片段相较于文字信息更加直观,可以更加方便、快捷且客观的使用户了解视频的内容。

Description

一种视频播放方法及装置
本申请要求于2018年11月28日提交中国国家知识产权局、申请号为201811434790.9、发明名称为“一种视频播放方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种视频播放方法及装置。
背景技术
随着多媒体技术与网络技术的快速发展,数字视频迅速增长;各个视频平台之间的竞争日益激烈,视频平台为了能够提升用户体验,吸引更多用户关注视频内容,通常会在视频中较为精彩在位置上进行标注,例如,在视频的进度条上进行打点,形成多个打点位置,用户在触动或点击某一个打点位置时,会在该打点位置处显示该打点位置处视频内容的文字信息,这样有利于用户可以在较短的时间内切换到想要观看的位置,也可以保证用户能够较快的发现视频中较为精彩的部分。
但由于视频界面的限制,为了不影响用户正常观看视频,显示的文字信息通常是较为简短的语句,对于一些情节复杂的电影、电视剧,这些简短的语句所能表述的内容有限,不能使用户较好对视频内容有所了解,文字信息并不能很好的对视频内容进行概括,影响用户体验。
发明内容
本申请提供一种视频播放方法及装置,用以解决现有技术中显示的文字信息并不能很好的对视频内容进行概括,影响用户体验的问题。
第一方面,本申请实施例提供了一种视频播放方法,所述方法包括:首先,服务器会接收到终端设备的第一请求,所述第一请求用于请求所述终端设备需要播放的视频地址;所述服务器在接收到所述第一请求后,可以确定所述视频地址和视频的打点信息,之后将所述视频地址和所述打点信息携带在所述第一响应中,之后向所述终端设备发送第一响应,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址。
通过上述方法,所述终端设备在需要播放视频时,可以与所述服务器进行交互获取视频地址,同时也获取视频的打点信息,且所述打点信息中包括所述视频的打点位置对应的视频片段的存储地址,所述终端设备可以在播放视频时,播放所述打点位置处的视频片段,能够更直观的显示所述打点位置处的视频内容,可以有效改善用户体验。
在一种可能的设计中,所述服务器向所述终端设备发送第一响应,所述终端设备可与所述服务器交互,获取所述打点位置对应的视频片段,具体的,所述服务器可以先接收所述终端设备发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,且所述第二请求包括所述打点位置对应的视频片段的存储地址;之后,所述服 务器根据所述打点位置对应的视频片段的存储地址获取所述打点位置对应的视频片段,再向所述终端设备发送第二响应,所述第二响应包括所述打点位置对应的视频片段。
通过上述方法,所述终端设备在需要获取视频片段时,可以通过与所述服务器交互的方式,根据所述打点位置对应的视频片段的存储地址可以较为灵活、便捷的获取所述视频片段。
在一种可能的设计中,所述服务器向所述终端设备发送第一响应之前,所述服务器需要先确定所述打点信息,也就是确定所述打点位置和所述打点位置对应的视频片段。具体过程如下:所述服务器先将所述视频分割为多个视频片段;之后,所述服务器基于预设的神经网络模型,确定每个视频片段的精彩度;再根据所述每个视频片段的精彩度选取N个视频片段;选取了N个视频片段后,所述服务器根据所述N个视频片段在所述视频中的位置确定所述视频的N个打点位置,如此,其中一个所述打点位置是与所述N个视频片段中的一个视频片段对应的。
通过上述方法,确定的打点位置为所述视频中与精彩度相关的视频片段所在的位置,使得所述终端设备可以显示较为精彩的视频片段,可以方便用户较快的浏览到所述视频中的精彩片段。
在一种可能的设计中,所述服务器基于预设的神经网络模型,确定每个视频片段的精彩度时,可以先基于预设的神经网络模型,提取所述每个视频片段的第一特征,所述第一特征包括下列的部分或全部:帧序列的时间特征,帧序列的空间特征;也就是说,所述第一特征是所述视频片段的本身属性。之后,所述服务器根据所述每个视频片段的第一特征确定所述每个视频片段的精彩度。
通过上述方法,所述服务器可以从所述视频片段的本身属性确定精彩度,能够较为客观的反映所述视频片段的精彩程度(可以简称为精彩度)。
在一种可能的设计中,所述服务器将视频分割为多个视频片段的方式有许多种,下面列举其中一种:
所述服务器先对所述视频进行镜头分割,进行了镜头分割之后,可以获得多组图像帧,其中,每组图像帧包括连续的多帧图像;之后,所述服务器将所述多组图像帧合成为一个或多个预设长度的视频片段。
通过上述方法,先进行镜头分割后再合成视频片段,镜头分割可以确保切割后不会影响后续合成的视频片段内容的完整性,可以较好的将视频片段提供给用户。
在一种可能的设计中,所述服务器将视频分割为多个视频片段的方式有许多种,下面列举其中另一种:
所述服务器先对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;之后,将多组图像帧中显示的场景或人物相似的图像帧合成为一个或多个视频片段,也就是说,一个所述视频片段中的任意相邻两帧图像的相似度在预设范围内。
通过上述方法,所述服务器可以提取将相似的场景或人物的图像帧合成为视频片段,使得视频片段显示的内容相关性较强,可以为用户提供更好的观看体验。
第二方面,本申请实施例提供了一种视频播放方法,所述方法包括:首先,终端 设备确定需要播放视频,或确定之后会有播放视频的需求后,向服务器发送第一请求,之后,再接收所述服务器发送的第一响应,所述第一请求用于请求所述终端设备需要播放的视频地址,所述第一响应中包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址;在获取了所述视频地址和所述打点信息后,所述终端设备可以根据所述视频地址获取所述视频,还可以根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段;在加载了所述视频片段后,可以播放所述视频和所述视频片段。
通过上述方法,所述终端设备在播放所述视频时,也可以显示所述视频的打点位置对应的视频片段,视频片段相较于文字信息更加直观,可以更加方便、快捷且客观的使用户了解视频的内容。
在一种可能的设计中,所述终端设备根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段时,可以通过与所述服务器的交互,获取所述视频片段,并进行加载,首先,所述终端设备可以向所述服务器发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;之后,所述终端设备会接收到所述服务器发送的第二响应,所述第二响应包括所述打点位置对应的视频片段;在接收到所述第二响应后,所述终端设备再根据所述第二响应在所述打点位置上加载对应的视频片段。
通过上述方法,所述终端设备在需要获取视频片段时,可以通过所述服务器进行交互,能够更加灵活、便捷从所述服务器获取所述视频片段。
在一种可能的设计中,所述终端设备播放所述视频过程中,显示距离当前播放位置最近的至少一个所述打点位置对应的视频片段。
通过上述方法,可以较方便的向用户显示即将要播放的视频内容,且显示视频片段的方式可以使用户更加方便的了解视频内容。
在一种可能的设计中,所述终端设备播放所述视频时,可以在接收到所述打点位置的触发操作后,播放所述打点位置对应的视频片段。
通过上述方法,可以较方便和直观的向用户显示想要获知的视频内容,可以较好的提高用户体验。
第三方面,本申请实施例提供了一种视频片段生成方法,所述方法包括:首先,服务器可以先将视频分割为多个视频片段;之后,所述服务器再基于预设的神经网络模型,确定每个视频片段的精彩度;并且所述服务器还可以根据所述每个视频片段的精彩度从所述多个视频片段选取N个视频片段;再对所述N个视频片段进行视频合成,例如可以合成一个短视频,所述服务器还可以向所述终端设备提供所述短视频。
通过上述方法,所述预设的神经网络模型确定的视频片段的精彩度更加客观,使得所述服务器可以将所述视频中较为精彩的部分提取并进行视频合成,还可以通过所述终端设备播放给用户观看,可以使得用户较快、较直观的获知视频内容。
第四方面,本申请实施例提供了一种视频片段生成方法,所述方法包括:首先,服务器可以先将视频分割为多个视频片段;之后,所述服务器再基于预设的神经网络模型,确定每个视频片段的精彩度;且所述服务器可以根据所述每个视频片段的精彩 度从所述多个视频片段选取N个视频片段;之后,所述服务器可以保存所述N个视频片段。所述服务器还可以将保存的所述N个视频片段分享给所述终端设备。
通过上述方法,所述预设的神经网络模型确定的视频片段的精彩度可以更直接的反映出视频片段的精彩程度,所述服务器中可以保存所述视频中较为精彩的视频片段,并可以通过终端设备向用户展示,可以提高所述视频对用户的吸引力。
第五方面,本申请实施例还提供了一种服务器,有益效果可以参见第一方面的描述此处不再赘述。所述服务器具有实现上述第一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述服务器的结构中包括接收单元和发送单元,还可以包括处理单元,这些单元可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第六方面,本申请实施例还提供了一种终端设备,有益效果可以参见第二方面的描述此处不再赘述。所述终端设备具有实现上述第二方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述终端设备的结构中包括发送单元、接收单元、加载单元和播放单元,这些单元可以执行上述第二方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第七方面,本申请实施例还提供了一种服务器,有益效果可以参见第三方面的描述此处不再赘述。所述服务器具有实现上述第三方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述服务器的结构中包括分割单元、确定单元、选取单元以及合成单元,还可以包括存储单元,这些单元可以执行上述第三方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第八方面,本申请实施例还提供了一种服务器,有益效果可以参见第四方面的描述此处不再赘述。所述服务器具有实现上述第四方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述服务器的结构中包括分割单元、确定单元、选取单元以及存储单元,这些单元可以执行上述第四方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第九方面,本申请实施例还提供了一种服务器,有益效果可以参见第一方面的描述此处不再赘述。所述服务器的结构中包括处理器,还可以包括收发器或存储器,所述处理器被配置为支持所述终端设备执行上述第一方面方法中相应的功能。所述存储器与所述处理器耦合,其保存所述服务器必要的程序指令和数据。所述收发器,用于与其他设备进行通信。
第十方面,本申请实施例还提供了一种终端设备,有益效果可以参见第二方面的描述此处不再赘述。所述终端设备的结构中包括处理器和收发器,还可以包括存储器,所述处理器被配置为支持所述终端设备执行上述第二方面方法中相应的功能。所述存 储器与所述处理器耦合,其保存所述终端设备必要的程序指令和数据。所述收发器,用于与其他设备进行通信,还可以包括显示器,用于接收所述处理器的指示,显示视频或视频片段。
第十一方面,本申请实施例还提供了一种服务器,有益效果可以参见第三方面的描述此处不再赘述。所述服务器的结构中包括处理器,还可以包括收发器或存储器,所述处理器被配置为支持所述终端设备执行上述第三方面方法中相应的功能。所述存储器与所述处理器耦合,其保存所述服务器必要的程序指令和数据。所述收发器,用于与其他设备进行通信。
第十二方面,本申请实施例还提供了一种服务器,有益效果可以参见第四方面的描述此处不再赘述。所述服务器的结构中包括处理器,还可以包括收发器或存储器,所述处理器被配置为支持所述终端设备执行上述第四方面方法中相应的功能。所述存储器与所述处理器耦合,其保存所述服务器必要的程序指令和数据。所述收发器,用于与其他设备进行通信。
第十三方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
第十四方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
第十五方面,本申请还提供一种计算机芯片,所述芯片与存储器相连,所述芯片用于读取并执行所述存储器中存储的软件程序,执行上述各方面所述的方法。
附图说明
图1为现有技术中打点位置显示文字信息的示意图;
图2为本申请提供的一种网络***的架构示意图;
图3为本申请提供的一种视频播放方法示意图;
图4为本申请提供的神经网络结构示意图;
图5为本申请提供的一种视频片段生成的方法示意图;
图6为本申请提供的另一种视频片段生成的方法示意图;
图7为本申请提供的一种服务器的结构示意图;
图8为本申请提供的一种终端设备的结构示意图;
图9为本申请提供的一种服务器的结构示意图;
图10为本申请提供的一种服务器的结构示意图;
图11为本申请提供的一种终端设备的结构示意图;
图12为本申请提供的一种服务器的结构示意图。
具体实施方式
本申请提供了一种视频播放方法及装置,用以解决现有技术中显示的文字信息并不能很好的对视频内容进行概括,影响用户体验的问题。
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进 一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或***实施例中。
视频打点(video dotting)是指对视频中的关键帧进行描述,以对视频内容进行内容概括的过程,如图1所示,在现有的视频打点方案中,终端设备通常是在视频的进度条上设置一个或多个打点位置,打点位置上标注了该位置处视频内容的文字信息,用户可以通过滑动鼠标将光标移动至打点位置,此时,会在该打点位置处显示视频内容的文字信息,但通常视频内容的文字信息较为简洁,不能直观的反映出视频内容的精彩程度,容易使得用户错过一些较为精彩的画面,不能有效提高用户体验。
如图2所示,为本申请实施例提供的一种网络架构示意图,该网络架构中包括终端设备和服务器。
所述服务器是部署在云端的远程服务器,也可以是网络中设置的可以提供业务的服务器,所述服务器具有视频处理功能以及数据计算功能,例如所述服务器可以执行视频分割、视频片段的精彩度确定等功能;所述服务器可以是超多核服务器、部署有图形处理器(graphics processing unit,GPU)集群的计算机、大型的分布式计算机、硬件资源池化的集群计算机等等,在本申请实施例中所述服务器可以生成打点信息,并在所述终端设备向所述服务器请求视频地址后,向所述终端设备发送视频地址和打点信息。
所述服务器还可以对视频进行分割,然后确定分割后得到一个或多个视频片段的精彩度,还可以从分割后的视频片段中选取较为精彩的多个视频片段进行视频合成,也可以保存较为精彩的视频片段。
所述服务器可以保存有所述终端设备所需的视频数据,其中包括视频的源数据,分割后的视频片段,每个视频片段的精彩度,视频合成后的视频(对应本申请实施例中的第一视频),以及动态图。
所述终端设备可以向所述服务器发起请求(在本申请实施例中对应第一请求以及第二请求),以便从所述服务器获取相关数据(如视频地址、打点信息、视频片段的存储地址、视频片段、视频合成后的视频、动态图等)。
所述终端设备在获得相关数据后,进行加载或显示等操作,例如,所述终端设备在获取所述视频地址后,会根据所述视频地址获取所述视频;当所述终端设备获取所述打点信息后,可以根据所述打点信息在所述视频的打点位置上加载视频片段,所述终端设备在加载了所述视频片段后,还可以播放所述视频片段;当所述终端设备获取所述视频合成后的视频或动态图,可以向用户显示视频合成后的视频或动态图。
本申请中的终端设备,又可称之为用户设备(user equipment,UE),可以部署在陆地上,包括室内或室外、手持或车载;也可以部署在水面上(如轮船等);还可以部署在空中(例如飞机、气球和卫星上等)。所述终端设备可以是手机(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线设备、无人驾驶(self driving)中的无线设备、远程医疗(remote medical)中的无线设备、智能电网(smart grid)中的无线设备、运输安全(transportation safety)中的无线设备、智 慧城市(smart city)中的无线设备、智慧家庭(smart home)中的无线设备等等。
在本申请实施例中,服务器可以向所述终端设备提供需要播放的视频地址以及所述视频的打点信息,且所述打点信息中包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址,使得所述终端设备可以基于所述打点信息在所述视频的打点位置上加载对应的视频片段,并进行播放,显然,相对于现有的打点方案中单纯的显示文字信息,在所述视频的打点位置显示视频片段,可以更加直观的反映出视频内容,能够有效的提升用户体验。下面基于如图2所示的网络***,对本申请实施例提供的一种视频播放方法进行介绍,如图3所示,所述方法包括:
步骤301:终端设备向服务器发送第一请求,所述第一请求用于请求所述终端设备需要播放的视频地址。
所述终端设备在确定需要播放视频时,若在本地未存储所述视频,可以向所述服务器发送第一请求,以请求所述终端设备需要播放的视频地址,其中请求所述终端设备需要播放的视频地址的方式有多种,例如,可以在所述第一请求中携带有所述视频的标识信息,以及指示获取所述视频地址的信元,上述方式仅是举例,凡是可以请求所述终端设备需要播放的视频地址的方式均适用于本申请实施例。
步骤302:所述服务器接收所述终端设备的第一请求后,所述服务器向所述终端设备发送第一响应,所述第一响应包括视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址。
步骤303:所述终端设备根据所述视频地址获取所述视频,根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段。
步骤304:所述终端设备播放所述视频和所述视频片段。
所述服务器在接收到所述第一请求后,为了可以将所述视频地址和视频的打点信息发送给所述终端设备,所述服务器需要先对所述视频进行解析,生成打点信息,也就是说需要确定所述视频的打点位置和所述打点位置对应的视频片段。
在实际应用中,所述服务器生成打点信息的方式有许多种,本申请不做限定,凡是生成的打点信息中包括所述视频的打点位置和每个打点位置对应的视频片段的存储地址方式均适用于本申请实施例。
本申请实施例提供了一种生成打点信息的方式,下面对这种方式进行介绍:
首先,所述服务器可以先将所述视频分割为多个视频片段。本申请实施例中,并不限定所述服务器分割视频的方式,所述服务器可以将所述视频按预设的长度分割为多个视频片段;所述服务器也可以根据所述视频的所展示的内容,将所述视频分割为多个视频片段,每个视频片段所述展示的内容不同;例如,所述服务器可以将所述视频中所述显示的内容包括某一个特定场景或人物的片段分割出来,合成为一个视频片段;若所述视频中包括多个不同的场景或人物,则可以为分割为多个视频片段。
以下列举两种所述服务器分割视频获得视频片段的方式:
方式一、将所述视频进行分割获得多个预设长度的视频片段。
在这种方式中,所述服务器首先对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像。
通常,视频中的镜头类型分为两种:突变镜头、渐变镜头。突变镜头是指视频中的一组连续的相关联的图像帧直接切换到下一组连续的相关联的图像帧,中间没有过渡;渐变镜头是指视频中的一组图像帧通过色差或空间的效果,逐渐过渡到下一组图像帧。
所述服务器在对所述视频进行镜头分割时,根据视频的镜头类型对所述视频进行视频分割;在镜头分割的过程中,针对突变镜头,确定突变镜头中一组图像帧切换到下一组图像帧的节点,作为切割点,进行分割,将切割点的前一个图像帧作为该组图像帧的结束帧,切割点的下一个图像帧作为下一组图像帧的起始帧;针对渐变镜头,确定渐变镜头在一组图像帧切换到下一组图帧像的渐变区间中,将渐变区间的前一个图像帧作为该组图像帧的结束帧,渐变区间的后一个图像帧的作为下一组图像帧的起始帧。所述服务器在进行镜头分割时,可以采用使用直方图特征的差异性分割视频镜头,如利用快速镜头分割(fast shot segmentation,FAST)算法确定切割点实现镜头分割,也可以采用其他方法,如采用三维全卷积神经网络(3dimension fully convolutional networks,3D-FCN)进行镜头分割。
所述服务器在对所述视频进行镜头分割的过程中,在所述视频中确定多个切割点,进而获得了多组图像帧,每组图像帧中包括连续的多帧图像。
为区分不同组的图像帧,所述服务器可以通过镜头分割获取每组图像帧的起始帧的帧号(起始图像帧的帧号)和终止帧的帧号(结束图像帧的帧号),还可以确定出每组图像帧的起始时间戳和终止时间戳。
作为一种可选的实施方式,所述服务器在进行镜头分割后,还可以去除每组图像帧中的过渡图像,过渡图像包括下列图像帧的部分或全部:全黑或全白的图像帧、显示的场景或人物较为模糊的图像帧以及融合帧。
所述融合帧是指视频中两个不同的图像帧融合形成的图像帧,例如视频中,前一个图像画面逐渐消失或变暗,后一个图像画面逐渐明显或变亮,中间存在前后两个图像画面重叠的图像帧,该种图像帧即为融合帧。
在确定了多组图像帧后,所述服务器可以将所述多组图像帧合成为一个或多个预设长度的视频片段。
对于多组图像帧中的任一组图像帧,按照播放每组图像帧所需的时间长度(每组图像帧播放的时间跨度)与预设长度的关系可以将多组图像帧分为三种类型;类型一、时间长度大于预设长度的一组图像帧,也就是说该组图像帧所包括的图像帧的个数较多,该组图像帧可以展示多个情景,该组图像帧构成一个长镜头;类型二、时间长度小于预设长度,且图像帧的个数小于设定数值的一组图像帧,也就是说该组图像帧所包括的图像帧的个数较少,该组图像帧可能不足以展示一个完整的情景,该组图像帧形成短镜头;类型三、时间长度等于预设长度,或时间长度与预设长度的差值较小,属于预设范围内;在这种情况下,可以认为该组图像帧的时间长度约等于预设的长度,也就是说该组图像帧所包括的图像帧的个数也介于长镜头和短镜头包括的图像帧的个数之间,该组图像帧可以展示一个或几个情景,该组图像帧构成单个镜头。
在本申请实施例中,视频片段的预设长度指示的是播放所述视频片段的时长;且 预设长度可以根据具体场景进行设置,本申请实施例并不限定设置的方式。
对于不同类型的任一组图像帧,所述服务器可以采用对应的方式合成一个或多个预设长度的视频片段;下面分别就不同类型的任一组图像帧合成视频片段的方法进行说明:
一、对于类型一的任一组图像帧。
由于类型一的任一组图像帧所包含的图像帧较多,该组图像帧的时间长度也较长,若该组图像帧的时间长度大于预设长度,则所述服务器可以将该组图像帧切分为一个或多个预设长度的视频片段。
将第i组图像帧切分后任一个视频片段f i+m的起始帧、结束帧、起始时间以及结束时间可由如下公式确定:
Figure PCTCN2019115889-appb-000001
Figure PCTCN2019115889-appb-000002
其中,L表示预设的长度,在合成视频片段时,视频片段的长度并不一定可以严格的达到L,可以是比L大一些,或小一些,故设置δ表示长度增益,视频片段的长度则介于L-δ和L+δ;f i表示第i个视频片段,s k表示视频中的第k组图像帧,(s (k+n)_end_time-s k_start_time)>L+δ,也就是说,该组图像帧大于预设的长度,其中,m=0,1,2,…,m′-2,m′=INT((s k_end_time-s k_start_time)/L),,m'表示第i组图像帧可以切分获得的视频片段的数量;v fps表示视频帧率。f (i+m)_start_frame_number表示第i+m个视频片段的起始帧的帧号,s k_start_frame_number表示视频中的第k组图像帧的帧号。f (i+m)_end_frame_number表示第i+m个视频片段的结束帧的帧号,s k_end_time表示视频中的第k组图像帧的帧号的结束时间。
二、对于类型二的任一组图像帧。
由于类型二的任一组图像帧所包含的图像帧较少,该组图像帧的时间长度也较短,若该组图像帧的时间长度小于预设长度,则所述服务器可以将连续的多组图像帧合成为一个或多个预设长度的视频片段。
将多组图像帧合成的任一视频片段f i的起始帧、结束帧、起始时间以及结束时间可由如下公式确定:
Figure PCTCN2019115889-appb-000003
其中,L-δ≤s (k+n)_end_time-s k_start_time≤L+δ,也就是说,连续的多组图像帧的总时间长度处于预设范围内,且与预设的长度相差较小,连续的多组图像帧中任一组图像帧的时间长度小于预设的长度。s k表示视频中的第k组图像帧,s k+n表示视频中的第k+n组图像帧,关于公式中的参数说明可参见前述说明,此处不再赘述。
三、对于类型三的任一组图像帧。
由于类型三的任一组图像帧所包含的图像帧数介于类型一的任一组图像帧数和类型二的任一组图像帧包括的图像帧数之间,该组图像帧的时间长度较长镜头来说较短,若该组图像帧的时间长度等于预设长度,或在一误差范围内可以看做等于预设长度,则所述服务器可以将该组图像帧合成为一个预设长度的视频片段。
将多组图像帧合成的任一个视频片段f i的起始帧、结束帧、起始时间以及结束时间可由如下公式确定:
Figure PCTCN2019115889-appb-000004
其中,L-δ≤s k_end_time-s k_start_time≤L+δ,该组图像帧的时间长度与预设长度可看做是相等的,关于公式中的参数说明可参见前述说明,此处不再赘述。
方式二、对视频进行分割获得一个或多个视频片段,其中任一所述视频片段中的任意相邻两帧图像的相似度在预设范围内,也就是说,一个所述视频片段所述展示的是一类场景或相似的人物。
在这种方式中,所述服务器也可以对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像。关于镜头分割的方式可参见方式一中关于镜头分割的相关描述,此处不再赘述。
之后,针对镜头分割形成的多组图像帧,可以将所述每组图像帧中展示的场景相似的图像帧合成为一个视频片段,若存在不同的场景,则针对不同的场景分别合成对应的一个视频片段,所述服务器可以将所述多组图像帧合成为多个视频片段。
所述服务器在合成视频片段时,需要确定多组图像帧中各个图像帧所展示的场景 是否相似或相同,确定的方式有许多种,本申请实施例并不限定;例如所述服务器可以先提取镜头(如长镜头、短镜头以及单个镜头)的关键帧的视觉特征,利用预设的相似性判断标准,把时间相近、语义内容相关的镜头聚类成一个场景,之后合成为对应的视频片段;所述服务器可以采用3D-FCN提取镜头的关键帧的视觉特征,也可以采用视频帧颜色直方图方法提取镜头的关键帧的视觉特征等;所述服务器可以采用树形支持向量机(support vector machine,SVM)进行镜头聚类。
在所述服务器在分割获得多个视频片段之后,所述服务器可以直接将每个视频片段在所述视频的起始位置作为所述视频的打点位置,此时每个打点位置对应一个视频片段;所述服务器也可以去掉所述多个视频片段中的部分视频片段,将剩余的视频片段中每个视频片段的起始位置作为所述视频的打点位置;所述服务器去除部分视频片段的方式本申请实施例并不限定,可以是随机去除,也可以去除所述多个视频片段中存在较多过渡镜头,且过渡镜头时间较长的视频片段,也可以根据实际应用场景去除视频片段。
分割获得多个视频片段后,直接确定所述视频的打点位置的方式仅是举例说明,事实上,所述服务器还可以采用其他方式更加精确的确定所述视频的打点位置,所述服务器可以先对每个视频片段的精彩程度进行评价,也就是先确定每个视频片段的精彩度,之后根据每个视频片段的精彩度选取视频片段,再确定所述视频的打点位置。
需要说明的是,本申请实施例中并不限定所述打点信息中包括的打点位置的数量,可以是一个也可以是多个。
针对一个视频片段的精彩程度,衡量的标准有许多种,例如,所述服务器在衡量一个视频片段的精彩程度时,可以依据该视频片段的观看次数,次数越多,该视频片段的精彩程度越高,也就是精彩度越高。在具体实施中,所述服务器可以调取各个视频片段的播放次数,将播放次数作为视频的精彩度,也可以通过预设的函数,将播放次数转换为一个十分制或百分制下的数值,作为该视频片段的精彩度,数值越大表征视频片段越精彩;又例如,所述服务器也可以根据该视频片段中用户发布评论(如弹幕)的数量,用户发布评论的数量越大,该视频片段的精彩程度越高,也就是精彩度越高,在具体实施中,所述服务器可以调取各个视频片段的户发布评论(如弹幕)的数量,将评论的数量作为视频的精彩度,也可以通过预设的函数,将评论的数量转换为一个十分制或百分制下的数值,作为该视频片段的精彩度,数值越大表征视频片段越精彩;又例如,以电影为例,在一个电影中较为精彩的部分通常是人物情感爆发或人物之间有剧烈打斗的场面,人物情感爆发会伴随有人物的声音音调或频率提升,人物之间有剧烈打斗的场面会伴随有一些剧烈的声响,所述服务器在确定各个视频片段的精彩程度时,可以根据各个视频片段在相同的播放声音下,声音的频率或音调确定视频片段的精彩程度。
在一种可能的实施方式中,所述服务器也可以通过各个视频片段的一些特征(如视频片段中每帧图像的亮度、颜色、纹理等图像特征)确定各个视频片段的精彩度,这就需要对每个视频片段进行分析。
在对每个视频片段进行分析时,所述服务器可以基于预设的神经网络模型,确定每个视频片段的精彩度。
首先,所述服务器可以基于预设的神经网络模型,提取所述每个视频片段的第一特征,所述第一特征包括下列的部分或全部:帧序列的时间特征,帧序列的空间特征。
其中,每个视频片段是由图像帧序列构成的,每个视频片段的空间特征对应的是帧序列的空间特征,是所述预设的神经网络模型提取的图像帧的外观特征,表征图像帧的色彩、亮度、对比度、清晰度、纹理等信息的丰富程度。
每个视频片段的时间特征对应的是帧序列的时间特征,是所述预设的神经网络模型提取的连续多个图像帧的外观特征,表征连续多个图像帧的色彩、亮度、对比度、清晰度、纹理等信息的相互关联性,以及多帧图像中物体的运动剧烈程度。
所述预设的神经网络模型是预先通过样本数据训练好的,可以输出视频片段的第一特征的模型,所述样本数据是已经标注了精彩程度的视频片段,通过训练,使得所述预设神经网络模型可以提取视频片段的第一特征。
所述预设的神经网络模型可以只提取视频片段的空间特征,也可以只提取视频片段的时间特征,也可以同时提取视频片段的空间特征和时间特征。
本申请实施例中并不限定所述预设的神经网络模型的中包括的网络层的数量及类型,凡是可以提取视频片段的空间特征的神经网络模型均适用于本申请实施例,下面介绍其中一种神经网络模型以及该神经网络模型的提取第一特征的过程。
如图4所示,为本申请实施例提供的一种神经网络模型的结构,该神经网络模型包括输入层、N个卷积层(为了区分每个卷积层,从输入到输出的方向进行命名,分别为第一卷积层、第二卷积层…第N卷积层)、全连接层、输出层。
对于任一个视频片段有多个图像帧构成,输入到如图4所示的神经网络模型的输入层,输入层将该视频片段中的多个图像帧分组,每个分组包括T个图像帧,之后输入到N个卷积层,每个卷积层对每个分组中的图像帧进行卷积操作(如3D卷积操作)和池化操作(如max-pooling池化),每进行一次卷积操作,每个分组中的图像帧减少两帧,直至到第N卷积层中,进行了卷积操作和池化操作后得到1个图像帧;之后将得到的1个图像帧输入到所述全连接层进行处理,所述全连接层将处理后的数据输入到所述输出层,所述输出层输出视频片段的第一特征(在图4中用hi表示)。
之后,所述服务器根据每个视频片段的第一特征确定每个视频片段的精彩度。
通常,所述预设的神经网络模型提取的第一特征为向量或者形式较为复杂的数据,并不能很直观的反映出视频片段的精度程度,所述服务器可以将提取的视频片段的第一特征转化为较为直观的视频片段精彩度,例如,通过预设的函数,将视频片段的第一特征转换为精彩度,本申请实施例并不限定函数的表征方式,凡是可以将视频片段的第一特征转换为精彩度的函数均适用于本申请实施例。
作为一种可能的实现方式,所述服务器可以通过softmax函数对第一特征进行转换:
Figure PCTCN2019115889-appb-000005
其中,H i表示第i个视频片段的精彩度,H i∈(0,1),H i越接近1对应视频片段精彩度越高;w i表示第i个视频片段的第一特征,N表示视频片段的总数。
所述预设的神经网络模型也可以同时具有提取视频片段的第一特征的功能以及将所述视频片段的第一特征转换为精彩度的功能,使得所述预设的神经网络模型可以直接输出视频片段的精彩度。
所述服务器衡量一个视频片段的精彩程度,并确定一个视频片段的精彩度的方法有许多种,上述方式仅是举例说明,凡是可以确定视频片段精彩度的方式均适用于本申请实施例。
所述服务器在确定了每个视频片段的精彩度后,可以根据所述每个视频片段的精彩度选取N个视频片段,例如所述服务器可以选择精彩度从大到小排序之后的前N个视频片段,也可以设置预设的精度度范围,选择精彩度属于所述预设的精度度范围的N个视频片段。
在选取了N个视频片段之后,所述服务器可以根据所述N个视频片段在所述视频中的位置确定所述视频的N个打点位置,其中,一个所述打点位置与所述N个视频片段中的一个视频片段对应。
所述服务器在确定了打点位置和每个打点位置对应的视频片段后,可以将打点位置的信息和对应的视频片段存储在本地,也可以存储在其他服务器中。
其中,所述打点位置的信息是可以标识所述打点位置的信息,包括下列信息的部分或全部:
所述打点位置的标识、所述打点位置在所述视频的进度条上位置。
在一个视频中存在超过一个打点位置的情况下,为了区分不同的打点位置,可以对每个打点位置设置标识,如可以进行编号,也可以通过字母进行区分,也就是说,所述打点位置的标识可以是数字,也可以是字母,还可以是具体的时间点。凡是可以标识不同的所述打点位置的方式均适用于本申请实施例。
针对任一个视频片段,所述服务器可以根据该视频片段在所述视频中的位置确定所述视频的一个打点位置,所述服务器可以将该视频片段在所述视频中的起始位置作为该打点位置,在这种情况下,该视频片段在所述视频中的进度条上的位置为所述视频的一个打点位置,该打点位置与该视频片段存在对应关系。
所述服务器保存的所述打点位置对应的视频片段可以是带有音频,也可以不带有音频的,如动态图。
当所述终端设备向所述服务器发送所述第一请求后,所述服务器可以将携带有所述打点信息的第一响应发送给所述终端设备。
其中,所述第一响应还包括所述视频地址,所述视频地址为所述视频的存储地址。
在本申请实施例中,为了能够使得传输的信息占用的资源较少,所述服务器在所述第一响应中只携带了所述视频地址和所述视频的打点信息,所述终端设备可以在需要展示视频或视频片段时,在根据所述视频地址或所述视频的打点信息获取所述视频或视频片段。
可选的,所述终端设备为了能够更加灵活的播放所述视频或视频片段,也可以向 所述服务器发送用于请求所述视频的请求,所述服务器可以反馈携带有所述视频、所述视频的打点位置、所述视频的打点位置对应的视频片段的响应消息,接收到所述响应消息后,所述终端设备可以灵活的选择展示所述视频和所述视频片段的时间和方式。
所述终端设备在接收到所述第一响应之后,为了获取所述视频,所述终端设备可以向所述服务器或存储有所述视频的设备发送获取所述视频的请求,所述请求中可以携带有所述视频地址。
所述终端设备在获取所述视频后,可以预先加载每个打点位置对应的视频片段,也可以预先加载部分打点位置对应的视频片段,如可以只加载在进度条上排序靠前(播放位置靠前)的打点位置对应的视频片段,当所述终端设备播放所述视频的播放到进度条靠后的位置时,再加载剩余的打点位置对应的视频片段。
作为一种可能的实施方式,所述终端设备也可以边播放所述视频,边在所述打点位置上加载对应的视频片段,例如,所述终端设备可以在播放所述视频时,加载距离当前播放位置最近的至少一个或多个所述打点位置对应的视频片段;所述终端设备也可以在播放所述视频时,按照所述打点位置在进度条上的顺序,依次在每个所述打点位置上加载对应的视频片段。
所述终端设备根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段具体可以包括:所述终端设备可以先向所述服务器发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址,所述终端设备随后接收所述服务器返回的视频片段。
作为一种可能的实施方式,所述第二请求中可以包括所述打点位置对应的视频片段的标识,便于服务器根据该标识获取对应的视频片段并返回给终端设备。
具体的,当所述终端设备只加载所述视频的部分打点位置对应的视频片段,所述第二请求可以用于请求所述视频的部分打点位置对应的视频片段,所述第二请求中包括所述视频的部分打点位置对应的视频片段的存储地址;当所述终端设备需要加载所述视频的所有打点位置对应的视频片段,所述第二请求用于请求所述视频的所有打点位置对应的视频片段,所述第二请求中包括所述视频的所有打点位置对应的视频片段的存储地址。
所述服务器在接收到所述第二请求后,根据所述第二请求,也就是说根据所述所述打点位置对应的视频片段的存储地址获取所述打点位置对应的视频片段,将所述打点位置对应的视频片段携带在第二响应中发送给所述终端设备,所述终端设备接收所述服务器发送的第二响应,即可播放对应的视频片段。
当所述终端设备播放所述视频和所述视频片段时,具体可以分为如下两种情况:
情况一、所述终端设备播放所述视频时,主动播放距离当前播放位置最近的至少一个所述打点位置对应的视频片段。
所述终端设备在可以在距离当前播放位置最近的至少一个所述打点位置处显示小窗口,分别播放对应的视频片段;也可以在所述视频的显示界面进行分屏显示,也就是所述视频的显示界面分为两部分,一部分播放当前的所述视频,另一部分播放距离当前播放位置最近的至少一个所述打点位置对应的视频片段。
情况二、所述终端设备在接收到所述打点位置的触发操作后,播放所述打点位置对应的视频片段。
所述打点位置的触发操作可以是检测到光标停留在所述打点位置处,也可以用户通过鼠标在所述打点位置处进行了单击或双击操作;也可以是检测到用户在所述打点位置处进行了触屏。
在接收到所述打点位置的触发操作后,可以在所述打点位置处显示小窗口,播放对应的视频片段。
需要说明的是,所述终端设备在播放所述视频和所述视频片段时,所述视频和所述视频片段可以是同时播放,如在情况一中,所述终端设备在可以大窗口播放所述视频,在小窗口播放所述视频片段,为了不影响用户体验,在播放所述视频片段时,可以只播放所述视频片段的图像,不播放声音;所述终端设备也可以先暂停播放所述视频,而是只播放所述视频片段,如在情况二中,所述终端设备在接收到所述打点位置的触发操作之后,可以在暂停播放所述视频,在所述打点位置处,展示小窗口,播放所述打点位置对应的视频片段(显示图像以及声音)。
如图5所示,本申请实施例还提供了一种视频片段生成方法,所述方法包括:
步骤501:服务器将视频分割为多个视频片段。
步骤502:所述服务器基于预设的神经网络模型,确定每个视频片段的精彩度。
所述服务器分割视频以及确定每个视频片段的精彩度的方式可以参见如图4所示的实施例中的相关说明,此处不再赘述。
步骤503:所述服务器根据所述每个视频片段的精彩度从所述多个视频片段选取N个视频片段。
所述服务器在选取N个视频片段时,选取的方式有许多种,例如可以选择精彩度从大到小排序之后的前N个视频片段,也可以设置预设的精度范围,选择精彩度属于所述预设的精度范围的N个视频片段。
步骤504:所述服务器对所述N个视频片段进行视频合成。
所述服务器在选取了所述N个视频片段可以将N个视频片段合成为一个视频(为方便说明,用第一视频表示视频合成后的视频)。所述服务器在进行视频合成之后,可以存储视频合成后的视频,例如可以将所述第一视频保存在本地,也可以保存在其他服务器。
之后,所述终端设备可以向所述服务器发送用于请求所述第一视频的地址的请求,所述服务器可以将所述第一视频的地址发送给所述终端设备。
当所述终端设备确定需要所述第一视频,如所述终端设备确定需要显示所述第一视频后,再向所述服务器发送携带有所述第一视频的地址的请求,所述服务器在接收到所述请求后,再将所述第一视频发送给所述终端设备。
作为一种可能的实施方式,所述服务器在对所述N个视频片段进行视频合成后,也可以直接将所述第一视频发送给所述终端设备;所述服务器和所述终端设备之间也可以省去获取所述第一视频的地址的交互,所述终端设备直接向所述服务器发送用于请求所述第一视频的请求,服务器直接将所述第一视频发送给所述终端设备。
如图6所示,本申请实施例还提供了一种视频片段生成方法,所述方法包括:
步骤601:服务器将视频分割为多个视频片段。
步骤602:所述服务器基于预设的神经网络模型,确定每个视频片段的精彩度。
步骤603:所述服务器根据所述每个视频片段的精彩度从所述多个视频片段选取N个视频片段。
所述服务器分割视频、确定每个视频片段的精彩度的方式以及选取视频片段的步骤可以参见如图5所示的实施例中的相关说明,此处不再赘述。
步骤604:所述服务器保存所述N个视频片段。
所述服务器保存的所述N个视频片段可以是带有音频,也可以不带有音频的,如动态图。
所述服务器在保存了所述N个视频片段后,可以将所述N个视频片段发送给所述终端设备,所述服务器可以直接发送,也可以在接收到所述终端设备的请求后发送,下面以所述服务器保存的所述N个视频片段为N个动态图为例进行说明:
所述终端设备可以向所述服务器发送用于请求所述动态图的地址的请求,所述服务器可以将所述动态图的地址发送给所述终端设备。
当所述终端设备确定需要所述动态图,如所述终端设备确定需要显示所述动态图后,再向所述服务器发送携带有所述动态图的地址的请求,所述服务器在接收到所述请求后,再将所述动态图发送给所述终端设备。
作为一种可能的实施方式,所述服务器也可以直接将所述动态图发送给所述终端设备;所述服务器和所述终端设备之间也可以省去获取所述动态图的地址的交互,所述终端设备直接向所述服务器发送用于请求所述动态图的请求,所述服务器直接将所述动态图发送给所述终端设备。
需要说明的是,所述终端设备和所述服务器交互中可以只获取所述N个动态图中的部分动态图,也可以获取全部所述N个动态图,本申请实施例中并不限定。
基于与方法实施例的同一发明构思,本发明实施例提供一种服务器,具体用于实现如图3所示的方法实施例中所述服务器执行的方法,该服务器的结构如图7所示,包括接收单元701和发送单元702;
所述接收单元701,用于接收终端设备的第一请求,所述第一请求用于请求所述终端设备需要播放的视频地址。
所述发送单元702,用于向所述终端设备发送第一响应,所述第一响应包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址。
在一种可能的实施方式中,所述服务器还可以将所述视频片段发送给所述终端设备。
具体的,所述服务器700还包括处理单元703,所述接收单元701会接收到所述终 端设备发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;在所述接收单元701接收到所述第二请求后,所述处理单元703根据所述打点位置对应的视频片段的存储地址获取所述打点位置对应的视频片段,之后所述发送单元702可以向所述终端设备发送第二响应,所述第二响应包括所述打点位置对应的视频片段。
为了可以将所述打点信息发送给所述终端设备,所述处理单元703在所述发送单元702向所述终端设备发送第一响应之前,可以用于确定所述打点位置和所述打点位置对应的视频片段;具体的,所述处理单元703先将所述视频分割为多个视频片段;之后,基于预设的神经网络模型,确定每个视频片段的精彩度;并根据所述每个视频片段的精彩度选取N个视频片段;选取了所述N个视频片段后,所述处理单元703可以根据所述N个视频片段在所述视频中的位置确定所述视频的N个打点位置,其中,一个所述打点位置与所述N个视频片段中的一个视频片段对应。
在一种可能的实施方式中,在所述处理单元703基于预设的神经网络模型,确定每个视频片段的精彩度的过程中,所述处理单元703可以基于预设的神经网络模型,提取所述每个视频片段的第一特征,所述第一特征包括下列的部分或全部:帧序列的时间特征,帧序列的空间特征;所述处理单元703之后再根据所述每个视频片段的第一特征确定所述每个视频片段的精彩度。
所述处理单元703将视频分割为多个视频片段的方式有许多种,下面列举其中两种:
方式一、分割获得的视频片段的长度统一,为预设长度。
所述处理单元703先根据视频的镜头类型对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;之后将所述多组图像帧合成为一个或多个预设长度的视频片段。
方式二、分割获得的视频片段显示的是某种特定场景或某个特定人物。
所述处理单元703先根据对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;之后,将所述多组图像帧合成为一个或多个视频片段,其中,一个所述视频片段中的任意相邻两帧图像的相似度在预设范围内。
基于与方法实施例的同一发明构思,本发明实施例提供一种终端设备,具体用于实现如图3所示的方法实施例中所述终端设备执行的方法,该终端设备的结构如图8所示,包括发送单元801、接收单元802、加载单元803和播放单元804,其中:
所述发送单元801,用于向服务器发送第一请求,所述第一请求用于请求所述终端设备需要播放的视频地址。
所述接收单元802,用于接收所述服务器发送的第一响应,所述第一响应中包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址。
所述加载单元803,用于根据所述视频地址获取所述视频,根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段。
所述播放单元804,用于播放所述视频和所述视频片段。
所述加载单元803根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段时,所述终端设备可以与所述服务器进行交互,具体的,所述发送单元801先向所述服务器发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;之后,所述接收单元802再接收所述服务器发送的第二响应,所述第二响应包括所述打点位置对应的视频片段;在接收到所述第二响应后,所述加载单元803根据所述第二响应在所述打点位置上加载对应的视频片段。
作为一种可能的实施方式,所述播放单元804在播放所述视频片段时,可以在播放所述视频的过程中,显示距离当前播放位置最近的至少一个所述打点位置对应的视频片段。
作为另一种可能的实施方式,所述播放单元804在播放所述视频片段时,可以在接收到所述打点位置的触发操作后,播放所述打点位置对应的视频片段。
基于与方法实施例的同一发明构思,本发明实施例提供一种服务器,具体用于实现如图5所示的方法实施例中所述服务器执行的方法,该服务器的结构如图9所示,包括分割单元901、确定单元902、选取单元903以及合成单元904,其中:
所述分割单元901,用于将视频分割为多个视频片段。
所述确定单元902,用于基于预设的神经网络模型,确定每个视频片段的精彩度。
所述选取单元903,用于根据所述每个视频片段的精彩度从所述多个视频片段选取N个视频片段。
所述合成单元904,用于对所述N个视频片段进行视频合成。
可选的,所述服务器还可以包括存储单元,所述存储单元用于存储进行视频合成后的视频。
基于与方法实施例的同一发明构思,本发明实施例提供一种服务器,具体用于实现如图6所示的方法实施例中所述服务器执行的方法,该服务器的结构如图10所示,包括分割单元1001、确定单元1002、选取单元1003以及存储单元1004,其中:
所述分割单元1001,用于将视频分割为多个视频片段。
所述确定单元1002,用于基于预设的神经网络模型,确定每个视频片段的精彩度。
所述确定单元1003,用于根据所述每个视频片段的精彩度从所述多个视频片段选取N个视频片段。
所述存储单元1004,用于保存所述N个视频片段。
本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能单元可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一 台终端设备(可以是个人计算机,手机,或者网络设备等)或处理器(processor)执行本申请各个实施例该方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
在本申请实施例中,所述服务器和所述终端设备均可以采用集成的方式划分各个功能模块的形式来呈现。这里的“模块”可以指特定ASIC,电路,执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。
在一个简单的实施例中,本领域的技术人员可以想到所述终端设备可采用图8所示的形式。
如图11所示的终端设备1100,包括至少一个处理器1101,可选的,还可以包括收发器1102、存储器1103。
一种可能的实现方式中,所述终端设备1100还可以包括显示器1104。
存储器1103可以是易失性存储器,例如随机存取存储器;存储器也可以是非易失性存储器,例如只读存储器,快闪存储器,硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)、或者存储器1103是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器1103可以是上述存储器的组合。
本申请实施例中不限定上述处理器1101以及存储器1103之间的具体连接介质。本申请实施例在图中以存储器1103和处理器1101之间通过总线1105连接,总线1105在图中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。该总线1105可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1101可以具有数据收发功能,能够与其他设备进行通信,如在本申请实施例中,处理器1101可以向服务器发送第一请求或第二请求,也可以接收来自所述服务器的第一响应或第二响应,在如图11装置中,也可以设置独立的数据收发模块,例如收发器1102,用于收发数据;处理器1101在与其他设备进行通信时,可以通过收发器1102进行数据传输,如在本申请实施例中,处理器1101可以通过收发器1102向服务器发送第一请求或第二请求,也可以通过收发器1102接收来自所述服务器的第一响应或第二响应。
当所述终端设备采用图11所示的形式时,图11中的处理器1101可以通过调用存储器1103中存储的计算机执行指令,使得所述终端设备可以执行上述任一方法实施例中的所述终端设备执行的方法。
具体的,存储器1103中存储有用于实现图8中的发送单元、接收单元、加载单元和播放单元的功能的计算机执行指令,图8中的发送单元、接收单元、加载单元和显示单元的功能/实现过程均可以通过图11中的处理器1101调用存储器1103中存储的计算机执行指令来实现。
或者,存储器1103中存储有用于实现图8中的加载单元和播放单元的功能的计算机执行指令,图8中的加载单元和播放单元的功能/实现过程可以通过图11中的处理器 1101调用存储器1103中存储的计算机执行指令来实现,图8的发送单元和接收单元的功能/实现过程可以通过图11中的收发器1102来实现。
存储器1103除用于存储计算机执行指令外,还可以用于存储图8中的发送单元、接收单元、加载单元和播放单元所需要的视频数据或打点信息,如可以存储所述视频地址、所述视频片段、所述视频或所述视频的打点信息。
其中,处理器1101在执行播放单元的功能时,如涉及播放视频或视频片段的操作,处理器1101可以通过终端设备中的显示器1104显示播放的视频或视频片段。
可选的,处理器1101在执行显示单元的功能时,也可以通过其他设备中的显示器显示视频或视频片段,如向其他设备发送播放指令,指示视频或视频片段。
在一个简单的实施例中,本领域的技术人员可以想到所述服务器可采用图12所示的形式。
如图12所示的服务器1200,包括至少一个处理器1201,可选的,还可以包括存储器1202、收发器1203。
存储器1202可以是易失性存储器,例如随机存取存储器;存储器也可以是非易失性存储器,例如只读存储器,快闪存储器,硬盘或固态硬盘、或者存储器1202是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器1202可以是上述存储器的组合。
本申请实施例中不限定上述处理器1201以及存储器1202之间的具体连接介质。本申请实施例在图中以存储器1202和处理器1201之间通过总线1204连接,总线1204在图中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。该总线1204可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1201可以具有数据收发功能,能够与其他设备进行通信,在如图12装置中,也可以设置独立的数据收发模块,例如收发器1203,用于收发数据;处理器1201在与其他设备进行通信时,可以通过收发器1203进行数据传输。
当所述服务器采用图12所示的形式时,图12中的处理器1201可以通过调用存储器1202中存储的计算机执行指令,使得所述服务器可以执行上述任一方法实施例中的所述服务器执行的方法。
具体的,存储器1202中存储有用于实现图7中的发送单元、接收单元和处理单元的功能的计算机执行指令,图7中的发送单元、接收单元和处理单元703的功能/实现过程均可以通过图12中的处理器1201调用存储器1202中存储的计算机执行指令来实现。或者,存储器1202中存储有用于实现图7中处理单元703的功能的计算机执行指令,图7中的处理单元703的功能/实现过程可以通过图12中的处理器1201调用存储器1202中存储的计算机执行指令来实现,图7的发送单元和接收单元的功能/实现过程可以通过图12中的收发器1203来实现。
存储器1202除用于存储计算机执行指令外,还可以用于存储图7中的发送单元、接收单元和处理单元所需要的视频数据或打点信息,如可以存储所述视频地址、所述视频片段、所述视频或所述视频的打点信息。
具体的,存储器1202中存储有用于实现图9中的分割单元、确定单元、选取单元以及合成单元的功能的计算机执行指令,图9中的分割单元、确定单元、选取单元以及合成单元的功能/实现过程均可以通过图12中的处理器1201调用存储器1202中存储的计算机执行指令来实现,可选的,所述处理器1201还可以通过所述收发机1203将所述第一视频发送给其他设备。
存储器1202除用于存储计算机执行指令外,还可以用于存储图9中的分割单元、确定单元、选取单元以及合成单元所需要的视频数据,如可以存储所述视频片段、所述视频、所述第一视频。
具体的,存储器1202中存储有用于实现图10中的分割单元、确定单元、选取单元以及存储单元的功能的计算机执行指令,图10中的分割单元、确定单元、选取单元以及存储单元的功能/实现过程均可以通过图12中的处理器1201调用存储器1202中存储的计算机执行指令来实现。可选的,所述处理器1201还可以通过所述收发机1203将所述保存的视频片段发送给其他设备。
存储器1202除用于存储计算机执行指令外,还可以用于存储图10中的分割单元、确定单元、选取单元以及存储单元所需要的视频数据,如可以存储所述视频片段、所述视频或所述动态图。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (20)

  1. 一种视频播放方法,其特征在于,所述方法包括:
    服务器接收终端设备的第一请求,所述第一请求用于请求所述终端设备需要播放的视频地址;
    所述服务器向所述终端设备发送第一响应,所述第一响应包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址。
  2. 如权利要求1所述的方法,其特征在于,所述服务器向所述终端设备发送第一响应之后,还包括:
    所述服务器接收所述终端设备发送的第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;
    所述服务器根据所述打点位置对应的视频片段的存储地址获取所述打点位置对应的视频片段;
    所述服务器向所述终端设备发送第二响应,所述第二响应包括所述打点位置对应的视频片段。
  3. 如权利要求1所述的方法,其特征在于,所述服务器向所述终端设备发送第一响应之前,还包括:
    所述服务器将所述视频分割为多个视频片段;
    所述服务器基于预设的神经网络模型,确定每个视频片段的精彩度;
    所述服务器根据所述每个视频片段的精彩度选取N个视频片段;
    所述服务器根据所述N个视频片段在所述视频中的位置确定所述视频的N个打点位置,其中,一个所述打点位置与所述N个视频片段中的一个视频片段对应。
  4. 如权利要求3所述的方法,其特征在于,所述服务器基于预设的神经网络模型,确定每个视频片段的精彩度,包括:
    所述服务器基于预设的神经网络模型,提取所述每个视频片段的第一特征,所述第一特征包括下列的部分或全部:帧序列的时间特征,帧序列的空间特征;
    所述服务器根据所述每个视频片段的第一特征确定所述每个视频片段的精彩度。
  5. 如权利要求3所述的方法,其特征在于,所述服务器将视频分割为多个视频片段,包括:
    所述服务器对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;
    所述服务器将所述多组图像帧合成为一个或多个预设长度的视频片段。
  6. 如权利要求3所述的方法,其特征在于,所述服务器将视频分割为多个视频片段,包括:
    所述服务器根据视频的镜头类型对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;
    所述服务器将所述多组图像帧合成为一个或多个视频片段,其中,一个所述视频片段中的任意相邻两帧图像的相似度在预设范围内。
  7. 一种视频播放方法,其特征在于,所述方法包括:
    终端设备在向服务器发送第一请求后,接收所述服务器发送的第一响应,所述第一请求用于请求所述终端设备需要播放的视频地址,所述第一响应中包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址;
    所述终端设备根据所述视频地址获取所述视频,根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段;
    所述终端设备播放所述视频和所述视频片段。
  8. 如权利要求7所述的方法,其特征在于,所述终端设备根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段,包括:
    所述终端设备向所述服务器发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;
    所述终端设备接收所述服务器发送的第二响应,所述第二响应包括所述打点位置对应的视频片段。
  9. 如权利要求7或8所述的方法,其特征在于,所述终端设备播放所述视频片段,包括:
    所述终端设备播放所述视频时,显示距离当前播放位置最近的至少一个所述打点位置对应的视频片段。
  10. 如权利要求7或8所述的方法,其特征在于,所述终端设备播放所述视频片段,包括:
    所述终端设备在接收到所述打点位置的触发操作后,播放所述打点位置对应的视频片段。
  11. 一种服务器,其特征在于,所述装置包括接收单元和发送单元:
    所述接收单元,用于接收终端设备的第一请求,所述第一请求用于请求所述终端设备需要播放的视频地址;
    所述发送单元,用于向所述终端设备发送第一响应,所述第一响应包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址。
  12. 如权利要求11所述的服务器,其特征在于,所述服务器还包括处理单元;
    所述接收单元,还用于接收所述终端设备发送第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;
    所述处理单元,用于根据所述打点位置对应的视频片段的存储地址获取所述打点位置对应的视频片段;
    所述发送单元,还用于向所述终端设备发送第二响应,所述第二响应包括所述打点位置对应的视频片段。
  13. 如权利要求11所述的服务器,其特征在于,所述服务器还包括处理单元,所 述处理单元在所述发送单元向所述终端设备发送第一响应之前,用于:
    将所述视频分割为多个视频片段;
    基于预设的神经网络模型,确定每个视频片段的精彩度;
    根据所述每个视频片段的精彩度选取N个视频片段;
    根据所述N个视频片段在所述视频中的位置确定所述视频的N个打点位置,其中,一个所述打点位置与所述N个视频片段中的一个视频片段对应。
  14. 如权利要求13所述的服务器,其特征在于,所述处理单元基于预设的神经网络模型,确定每个视频片段的精彩度,具体用于:
    基于预设的神经网络模型,提取所述每个视频片段的第一特征,所述第一特征包括下列的部分或全部:帧序列的时间特征,帧序列的空间特征;
    根据所述每个视频片段的第一特征确定所述每个视频片段的精彩度。
  15. 如权利要求13所述的服务器,其特征在于,所述处理单元将视频分割为多个视频片段,具体用于:
    根据视频的镜头类型对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;
    将所述多组图像帧合成为一个或多个预设长度的视频片段。
  16. 如权利要求13所述的服务器,其特征在于,所述处理单元将视频分割为多个视频片段,具体用于:
    根据对所述视频进行镜头分割,获得多组图像帧,其中,每组图像帧包括连续的多帧图像;
    将所述多组图像帧合成为一个或多个视频片段,其中,一个所述视频片段中的任意相邻两帧图像的相似度在预设范围内。
  17. 一种终端设备,其特征在于,所述装置包括发送单元、接收单元、加载单元和播放单元:
    所述发送单元,用于向服务器发送第一请求;
    所述接收单元,用于接收所述服务器发送的第一响应,所述第一请求用于请求所述终端设备需要播放的视频地址,所述第一响应中包括所述视频地址和视频的打点信息,所述打点信息包括所述视频的打点位置和所述打点位置对应的视频片段的存储地址;
    所述加载单元,用于根据所述视频地址获取所述视频,根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段;
    所述播放单元,用于播放所述视频和所述视频片段。
  18. 如权利要求17所述的终端设备,其特征在于,所述加载单元根据所述打点位置对应的视频片段的存储地址在所述打点位置上加载对应的视频片段时,
    所述发送单元,用于向所述服务器发送的第二请求,所述第二请求用于请求所述打点位置对应的视频片段,所述第二请求包括所述打点位置对应的视频片段的存储地址;
    所述接收单元,用于接收所述服务器发送的第二响应,所述第二响应包括所述打 点位置对应的视频片段;
    所述加载单元,用于根据所述第二响应在所述打点位置上加载对应的视频片段。
  19. 如权利要求17或18所述的终端设备,其特征在于,所述播放单元在播放所述视频片段,具体用于:
    播放所述视频时,显示距离当前播放位置最近的至少一个所述打点位置对应的视频片段。
  20. 如权利要求17或18所述的终端设备,其特征在于,所述播放单元在播放所述视频片段,具体用于:
    在接收到所述打点位置的触发操作后,播放所述打点位置对应的视频片段。
PCT/CN2019/115889 2018-11-28 2019-11-06 一种视频播放方法及装置 WO2020108248A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19891244.6A EP3876543A4 (en) 2018-11-28 2019-11-06 VIDEO PLAYBACK PROCESS AND APPARATUS
US17/333,015 US20210289266A1 (en) 2018-11-28 2021-05-28 Video playing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811434790.9 2018-11-28
CN201811434790.9A CN111246246A (zh) 2018-11-28 2018-11-28 一种视频播放方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/333,015 Continuation US20210289266A1 (en) 2018-11-28 2021-05-28 Video playing method and apparatus

Publications (1)

Publication Number Publication Date
WO2020108248A1 true WO2020108248A1 (zh) 2020-06-04

Family

ID=70852639

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/115889 WO2020108248A1 (zh) 2018-11-28 2019-11-06 一种视频播放方法及装置

Country Status (4)

Country Link
US (1) US20210289266A1 (zh)
EP (1) EP3876543A4 (zh)
CN (1) CN111246246A (zh)
WO (1) WO2020108248A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113949828B (zh) * 2021-10-18 2024-04-30 北京达佳互联信息技术有限公司 视频剪辑方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100316359A1 (en) * 2009-06-11 2010-12-16 James Mally ENHANCING DVDs BY SHOWING LOOPING VIDEO CLIPS
CN103067386A (zh) * 2012-12-28 2013-04-24 苏州汉辰数字科技有限公司 一种多媒体流文件量子化处理方法
CN104240741A (zh) * 2013-06-07 2014-12-24 杭州海康威视数字技术股份有限公司 视频录像中进行视频打点及查找的方法和视频录像设备
CN104754415A (zh) * 2015-03-30 2015-07-01 北京奇艺世纪科技有限公司 一种视频播放方法及装置
CN105872806A (zh) * 2016-05-05 2016-08-17 苏州花坞信息科技有限公司 一种在线视频播放方法
CN106375860A (zh) * 2016-09-30 2017-02-01 腾讯科技(深圳)有限公司 一种视频播放方法、装置、终端及服务器

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1968358A (zh) * 2006-09-14 2007-05-23 浙江大学 基于时间约束的频繁镜头模式的视频摘要自动生成方法
CN102572523A (zh) * 2010-12-30 2012-07-11 新奥特(北京)视频技术有限公司 一种在线打点下载视频的方法和***
US9288511B2 (en) * 2011-02-18 2016-03-15 Futurewei Technologies, Inc. Methods and apparatus for media navigation
US9727215B2 (en) * 2013-11-11 2017-08-08 Htc Corporation Method for performing multimedia management utilizing tags, and associated apparatus and associated computer program product
CN104113789B (zh) * 2014-07-10 2017-04-12 杭州电子科技大学 一种基于深度学习的视频摘要在线生成方法
US9451307B2 (en) * 2014-12-08 2016-09-20 Microsoft Technology Licensing, Llc Generating recommendations based on processing content item metadata tags
CN104410920B (zh) * 2014-12-31 2015-12-30 合一网络技术(北京)有限公司 基于视频分段播放量进行精彩片段标注的方法
CN104768083B (zh) * 2015-04-07 2018-03-09 无锡天脉聚源传媒科技有限公司 一种章节内容展示的视频播放方法及装置
CN104822092B (zh) * 2015-04-30 2018-08-24 无锡天脉聚源传媒科技有限公司 视频打点、标引和字幕合并处理方法及装置
CN105847998A (zh) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 一种视频播放方法、播放终端及媒体服务器
CN107241622A (zh) * 2016-03-29 2017-10-10 北京三星通信技术研究有限公司 视频定位处理方法、终端设备及云端服务器
CN105812959A (zh) * 2016-04-28 2016-07-27 武汉斗鱼网络科技有限公司 一种视频播放器精彩点标注方法及装置
US10681391B2 (en) * 2016-07-13 2020-06-09 Oath Inc. Computerized system and method for automatic highlight detection from live streaming media and rendering within a specialized media player
CN107222795B (zh) * 2017-06-23 2020-07-31 南京理工大学 一种多特征融合的视频摘要生成方法
CN107730528A (zh) * 2017-10-28 2018-02-23 天津大学 一种基于grabcut算法的交互式图像分割与融合方法
CN108537139B (zh) * 2018-03-20 2021-02-19 校宝在线(杭州)科技股份有限公司 一种基于弹幕信息的在线视频精彩片段分析方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100316359A1 (en) * 2009-06-11 2010-12-16 James Mally ENHANCING DVDs BY SHOWING LOOPING VIDEO CLIPS
CN103067386A (zh) * 2012-12-28 2013-04-24 苏州汉辰数字科技有限公司 一种多媒体流文件量子化处理方法
CN104240741A (zh) * 2013-06-07 2014-12-24 杭州海康威视数字技术股份有限公司 视频录像中进行视频打点及查找的方法和视频录像设备
CN104754415A (zh) * 2015-03-30 2015-07-01 北京奇艺世纪科技有限公司 一种视频播放方法及装置
CN105872806A (zh) * 2016-05-05 2016-08-17 苏州花坞信息科技有限公司 一种在线视频播放方法
CN106375860A (zh) * 2016-09-30 2017-02-01 腾讯科技(深圳)有限公司 一种视频播放方法、装置、终端及服务器

Also Published As

Publication number Publication date
EP3876543A1 (en) 2021-09-08
US20210289266A1 (en) 2021-09-16
CN111246246A (zh) 2020-06-05
EP3876543A4 (en) 2021-10-20

Similar Documents

Publication Publication Date Title
EP3195601B1 (en) Method of providing visual sound image and electronic device implementing the same
CN109168026A (zh) 即时视频显示方法、装置、终端设备及存储介质
CN109474850B (zh) 运动像素视频特效添加方法、装置、终端设备及存储介质
WO2023051185A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN108845741B (zh) 一种ar表情的生成方法、客户端、终端和存储介质
US11928152B2 (en) Search result display method, readable medium, and terminal device
CN112035046B (zh) 榜单信息显示方法、装置、电子设备及存储介质
CN114331820A (zh) 图像处理方法、装置、电子设备及存储介质
CN113163230A (zh) 视频消息生成方法、装置、电子设备及存储介质
KR102673676B1 (ko) 메시징 시스템 내의 비디오에 광고들을 삽입함
CN114245028B (zh) 图像展示方法、装置、电子设备及存储介质
US11856255B2 (en) Selecting ads for a video within a messaging system
CN109034242B (zh) 图像处理算法的评分方法、装置及***
WO2023193521A1 (zh) 一种视频修复的方法、相关装置、设备以及存储介质
CN114419213A (zh) 图像处理方法、装置、设备和存储介质
CN115002359A (zh) 视频处理方法、装置、电子设备及存储介质
WO2020108248A1 (zh) 一种视频播放方法及装置
US20210195134A1 (en) Method and device for generating dynamic image, mobile platform, and storage medium
US9036921B2 (en) Face and expression aligned movies
US20230043683A1 (en) Determining a change in position of displayed digital content in subsequent frames via graphics processing circuitry
CN115499577B (zh) 一种图像处理方法及终端设备
CN114697741B (zh) 多媒体信息的播放控制方法及相关设备
CN111367598B (zh) 动作指令的处理方法、装置、电子设备及计算机可读存储介质
CN109905766A (zh) 一种动态视频海报生成方法、***、装置及存储介质
US20230326094A1 (en) Integrating overlaid content into displayed data via graphics processing circuitry and processing circuitry using a computing memory and an operating system memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19891244

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019891244

Country of ref document: EP

Effective date: 20210603