WO2020220968A1 - 一种视频数据处理方法和相关装置 - Google Patents

一种视频数据处理方法和相关装置 Download PDF

Info

Publication number
WO2020220968A1
WO2020220968A1 PCT/CN2020/084112 CN2020084112W WO2020220968A1 WO 2020220968 A1 WO2020220968 A1 WO 2020220968A1 CN 2020084112 W CN2020084112 W CN 2020084112W WO 2020220968 A1 WO2020220968 A1 WO 2020220968A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
target
pixel
matrix
video
Prior art date
Application number
PCT/CN2020/084112
Other languages
English (en)
French (fr)
Inventor
郑远力
殷泽龙
谢年华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020217022717A priority Critical patent/KR102562208B1/ko
Priority to SG11202105410RA priority patent/SG11202105410RA/en
Priority to EP20799151.4A priority patent/EP3965431A4/en
Priority to JP2021531593A priority patent/JP7258400B6/ja
Publication of WO2020220968A1 publication Critical patent/WO2020220968A1/zh
Priority to US17/334,678 priority patent/US11900614B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20101Interactive definition of point of interest, landmark or seed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • This application relates to the field of Internet technology, and in particular to a video data processing method and related devices.
  • the user can see the user text or user comments posted by the user or other users on the video playback interface.
  • the user text output to the video playback interface is usually output and displayed through a fixed text display track in the video playback interface.
  • the embodiments of the present application provide a video data processing method and related devices.
  • the embodiments of the present application provide a method for processing video data.
  • the method is applied to a computer device and includes:
  • a target pixel is determined from a key video frame of the target video, and multimedia information associated with the target pixel is acquired, wherein the key video frame is the trigger operation In the video frame where the target pixel is located, the target pixel is the pixel corresponding to the trigger operation in the key video frame;
  • the target trajectory information includes the target pixel in the key video frame
  • Position information in the next video frame of the key video frame is obtained by tracking the target pixel
  • the multimedia information is displayed based on the position information of the target pixel in the target track information in the next video frame of the key video frame.
  • the embodiments of the present application provide a video data processing method.
  • the method is applied to a service server and includes:
  • the track information is determined by the position information of the pixels in each video frame of the target video;
  • the target trajectory information associated with the position information of the target pixel in the key video frame, and return the target trajectory information, wherein the target trajectory
  • the information includes target location information, and the target location information is used to trigger the display of multimedia information associated with the target pixel in the next video frame of the key video frame.
  • One aspect of the embodiments of the present application provides a video data processing method, and the method includes:
  • trajectory information associated with the target video is generated, wherein the The trajectory information includes target trajectory information used to track and display the multimedia information associated with the target pixel in the target video.
  • One aspect of the embodiments of the present application provides a video data processing device.
  • the device is applied to computer equipment and includes:
  • the object determination module is configured to determine a target pixel from a key video frame of the target video in response to a trigger operation on the target video, and obtain multimedia information associated with the target pixel, wherein the key video A frame is a video frame where the trigger operation is located, and the target pixel is a pixel in the key video frame corresponding to the trigger operation;
  • a request determination module configured to determine a trajectory acquisition request corresponding to the target pixel based on the position information of the target pixel in the key video frame;
  • the trajectory acquisition module is configured to acquire target trajectory information associated with the position information of the target pixel in the key video frame based on the trajectory acquisition request, wherein the target trajectory information includes the target pixel Position information in a video frame next to the key video frame, and position information of the target pixel in a video frame next to the key video frame is obtained by tracking the target pixel;
  • the text display module is used for displaying the position information of the target pixel in the target track information in the next video frame of the key video frame when the next video frame of the key video frame is played. Description of multimedia information.
  • One aspect of the embodiments of the present application provides a video data processing device, which is applied to a service server and includes:
  • the request response module is used to obtain track information associated with the target video in response to the track acquisition request for the target pixel in the key video frame, where the key video frame is a video frame in the target video, so The target pixel is a pixel in the key video frame, and the track information is determined by the position information of the pixel in each video frame of the target video;
  • the trajectory screening module is used to filter the target trajectory information associated with the position information of the target pixel in the key video frame from the trajectory information associated with the target video, and return the target trajectory information, wherein, the target trajectory information includes target position information, and the target position information is used to trigger the display of multimedia information associated with the target pixel in the next video frame of the key video frame.
  • One aspect of the embodiments of the present application provides a video data processing device, and the device includes:
  • the first acquisition module is configured to acquire adjacent first video frames and second video frames from the target video
  • the matrix acquisition module is configured to determine the average value corresponding to the first video frame based on the optical flow tracking rules corresponding to the target video, the pixels in the first video frame, and the pixels in the second video frame Displacement matrix
  • a position tracking module configured to track the position information of the pixels in the first video frame based on the average displacement matrix, and determine the position information of the pixels obtained by tracking in the second video frame;
  • the trajectory generation module is configured to generate a trajectory associated with the target video based on the position information of the pixel in the first video frame and the position information of the pixel in the second video frame obtained by the tracking Information, wherein the trajectory information includes target trajectory information used to track and display multimedia information associated with the target pixel in the target video.
  • One aspect of the embodiments of the present application provides a computer device, including a processor, a memory, and a network interface;
  • the processor is connected to a memory and a network interface, where the network interface is used to provide data communication functions, the memory is used to store a computer program, and the processor is used to call the computer program to execute as in the embodiment of the present application Method in one aspect.
  • One aspect of the embodiments of the present application provides a computer storage medium that stores a computer program.
  • the computer program includes program instructions. When executed by a processor, the program instructions execute as in the embodiments of the present application. Method in one aspect.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of multiple video frames in a target video provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a scene for acquiring a target video provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a video data processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of acquiring multimedia information according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a full-image pixel tracking provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of tracking barrage data in consecutive multiple video frames according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another video data processing method provided by an embodiment of the present application.
  • FIG. 9 is a method for determining effective pixels provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of displaying barrage data based on trajectory information according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a video data processing device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of another video data processing device provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of another computer device provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of another video data processing device provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of another computer device provided by an embodiment of the present application.
  • the user text displayed in the video playback interface is independent of the video content played on the video playback interface, so that the displayed user text lacks a certain correlation with the video content.
  • the user terminal outputs the acquired user text through a predetermined text display track. Therefore, the user text sent by each user is output through the same text display track, so that it is impossible to comment on the video content in a targeted manner.
  • the network architecture may include a service server 2000 (or application server 2000) and a user terminal cluster.
  • the service server 2000 may be a server cluster composed of a large number of servers, for example, a cloud server, or simply called the cloud.
  • the user terminal cluster may include a plurality of user terminals, as shown in FIG. 1, and specifically may include a user terminal 3000a, a user terminal 3000b, a user terminal 3000c, ..., a user terminal 3000n.
  • the user terminal 3000a, the user terminal 3000b, the user terminal 3000c, ..., the user terminal 3000n can be connected to the service server 2000 respectively, so that each user terminal can connect to the service server 2000 through the network. Data exchange between.
  • each user terminal in the user terminal cluster can be integrated and installed with a target application.
  • the target application runs in each user terminal, it can communicate with the business server 2000 shown in Figure 1 above. Data interaction.
  • the target application may include multimedia applications, social applications, entertainment applications, and other applications with video playback functions.
  • the embodiment of the present application takes one user terminal of the plurality of user terminals as the target user terminal as an example to illustrate that the target user terminal integrated with the target application passes through the service data display platform and the service server 2000. The specific process of realizing data exchange between.
  • the target user terminal in the embodiment of the present application may include a personal computer, a tablet computer, a notebook computer, a smart phone, and other mobile terminals integrated with the above-mentioned target application.
  • the business server 2000 may be a background server of the target application, and the business database corresponding to the background server may be used to store each business data information displayed on the business data display platform, and the business data information may include video data, etc. Internet information.
  • multiple videos can be displayed on the business data display platform, and when a target user triggers one of the multiple videos through the business data display platform in the target user terminal, the video data corresponding to the video can be obtained, and then The video data can be played in the target user terminal, and the video data currently being played in the target user terminal can be further referred to as a target video.
  • the target video is the video data returned by the service server 2000 based on the data loading instruction sent by the target user terminal.
  • the target video may include multiple video frames, and each video frame may be referred to as an image data, and each video frame corresponds to a playback timestamp (that is, a playback time stamp) in the playback duration of the target video. Time), so that when the target user terminal subsequently loads and plays the target video, it can display the corresponding video frame in the playback display interface based on the playback timestamp corresponding to each video frame in the target video.
  • a playback timestamp that is, a playback time stamp
  • the service server 2000 can sub-frame each video in the video set stored in the service database during the video preprocessing stage, so that multiple video frames contained in each video can be split into one picture.
  • FIG. 2 is a schematic diagram of multiple video frames in a target video provided by an embodiment of the present application.
  • the target video may be video A in the aforementioned service database.
  • the video A may include n video frames, and n is a positive integer greater than zero.
  • the service server 2000 may split the n video frames in the video A into n pictures in advance. Each two adjacent pictures in the n pictures can be called an image pair. For example, as shown in FIG.
  • the embodiment of the present application may refer to the video frame corresponding to the first moment and the video frame corresponding to the second moment shown in FIG. 2 as an image pair, and may refer to the video corresponding to the second moment
  • the video frame corresponding to the frame and the third moment is called an image pair, ...
  • the video frame at the n-1th moment and the video frame corresponding to the nth moment may be called the first image pair.
  • multiple image pairs can be determined from multiple video frames of the target video, and each image pair can contain two consecutively adjacent video frames, that is, each Each image pair can contain two adjacent video frames.
  • the embodiment of the present application takes the first image pair of the multiple image pairs as an example.
  • the embodiment of the present application may call a video frame in the first image pair (for example, the video frame corresponding to the first moment shown in FIG. 2) the first video frame, and may The other video frame in the image pair (that is, the video frame corresponding to the second moment) is called the second video frame, and then based on the optical flow tracking rules, the positions of all pixels in the first video frame in the image pair The information is tracked to obtain the position information of each pixel in the first video frame in the second video frame.
  • each image pair contains two adjacent video frames
  • the position information of the pixel in the first video frame of each image pair in the next video frame can be calculated, and finally the information on the position of the pixel in the next video frame can be obtained in the service server 2000.
  • the service server 2000 can pre-calculate the trajectory information of all pixels in the video A, when the target user plays the video A in the target user terminal, the currently played video A can be called a target video .
  • a trigger operation can be performed on the object that needs to be tracked (ie, the target object) in the target user terminal.
  • the pixel corresponding to the trigger operation is called the target Pixels, that is, the target pixel is determined by the trigger operation performed by the target user for the target object in the currently played video frame, and the trigger operation can be used to select the current video frame to be tracked Target audience.
  • the embodiment of the application may call the video frame corresponding to the trigger operation a key video frame.
  • the embodiment of the application may call the video frame currently containing the target pixel as the key video frame in the target video .
  • the key video frame may be the video frame corresponding to the first moment in the embodiment corresponding to FIG. 2 above, and optionally, the key video frame may also be the second moment in the embodiment corresponding to FIG. 2 above. The corresponding video frames will not be listed here.
  • the embodiment of the present application may send the key video frame, the target pixel in the key video frame, and the position information of the target pixel to the service server 2000 in the embodiment corresponding to FIG. 1, so that the service The server 2000 may filter the trajectory information that matches the position information of the target pixel from the pre-calculated trajectory information of all the pixels in the target video based on the position information of the target pixel in the key video frame , As the target trajectory information.
  • the target track information may include the position coordinates of the target pixel in the video frame after the key video frame.
  • the service server 2000 may return the target trajectory information to the target user terminal, so that the target user terminal can further determine that the target pixel is in the key video frame according to the target trajectory information when playing the next video frame of the key video frame.
  • the position information in the next video frame of the video frame can obtain the target position information of the target pixel, and then the multimedia information corresponding to the target object can be displayed based on the target position information.
  • FIG. 3 is a schematic diagram of a scene for acquiring a target video provided by an embodiment of the present application.
  • the target user terminal shown in FIG. 3 may be the user terminal 3000a in the embodiment corresponding to FIG. 1 above.
  • the target user after entering the target application, the target user can display the business data display platform of the target application in the target user terminal (for example, a smart phone), and the business data display platform can display the business data display platform shown in Figure 3.
  • video 10a video 20a, video 30a, and video 40a.
  • the target user needs to play the video 30a shown in FIG. 3 in the target user terminal (the video 30a may be the video A in the embodiment corresponding to FIG.
  • the application server of the network connection relationship may be the business server 2000 in the embodiment corresponding to FIG. 1 above. It is understandable that when the application server obtains the data loading instruction, it can search for the video data corresponding to the target identification information from the service database, and can collectively refer to the found video data as target data, so that the target can be referred to as target data.
  • the data is given to the target user terminal shown in Fig.
  • the target user terminal can play the video data in the video playback interface shown in Fig. 3.
  • the target user terminal can select and play the target user
  • the video 30a is called the target video, that is, at this time, the target user terminal can play each video frame in the video A according to the playback timestamp shown in FIG. 3 above.
  • the above-mentioned service server 2000 obtains position information of the pixel in the second video frame.
  • the specific process of screening the target trajectory information corresponding to the target pixel refer to the implementation manners provided in the embodiments corresponding to FIGS. 8 to 10 below.
  • FIG. 4 is a schematic flowchart of a video data processing method provided by an embodiment of the present application. As shown in FIG. 4, the method can be applied to the target user terminal in the embodiment corresponding to FIG. 1, and the method may include:
  • Step S101 in response to a trigger operation on the target video, determine a target pixel from a key video frame of the target video, and obtain multimedia information associated with the target pixel, wherein the key video frame is all In the video frame where the trigger operation is located, the target pixel is a pixel in the key video frame corresponding to the trigger operation.
  • the target user terminal may display a business data display platform for carrying multiple business data information on the display interface of the target application, for example, each business data information on the business data display platform Can be a video.
  • the business data information displayed on the business data display platform may be filtered by an application server that has a network connection relationship with the target user terminal based on the user portrait data of the target user (for example, the historical behavior data of the target user). definite.
  • the video data corresponding to the video can be loaded from the business database corresponding to the application server, and then the video data can be displayed in the target
  • the loaded video data is played on the video playing interface of the user terminal.
  • the target user terminal may obtain the trigger operation performed by the target user on the target object (that is, the object to be tracked) in the video playback interface during the process of playing the video data.
  • the trigger operation is, for example, clicking or touching a certain point in the target object in the video frame displayed on the display screen of the target user terminal with a mouse.
  • the video frame corresponding to the trigger operation may be called a key video frame, and the pixel point corresponding to the trigger operation in the key video frame may be called a target pixel point.
  • Pixels are points in an image (for example, a video frame). If the image is a 640 ⁇ 480 resolution picture, 640 ⁇ 480 pixels are distributed on it. Generally, pixels in an image have spatial location and color (or grayscale) attributes.
  • the target user terminal can also create a text box in a sub-window independent of the video playback interface, so that the target user can input multimedia information associated with the target object in the text box. After the target user enters multimedia information in the text box, the target user terminal can obtain the multimedia information associated with the target object, that is, the multimedia information associated with the target object can be collectively referred to as the user text input by the target user , Or user comments.
  • the target user terminal may be a terminal device with a video data playback function
  • the target user terminal may be the user terminal 3000a in the embodiment corresponding to FIG. 1
  • the target user terminal may be understood as a mobile terminal.
  • the application server may be the business server 2000 in the embodiment corresponding to FIG. 1 above.
  • FIG. 5 is a schematic diagram of acquiring multimedia information according to an embodiment of the present application.
  • the target user terminal when the target user terminal is playing the video 30a in the embodiment corresponding to FIG. 3, the target user terminal may use the currently played video 30a as the target video. It is understandable that the target user can perform a trigger operation on a certain video frame of the multiple video frames included in the video 30a at any time when the video 30a is played.
  • the target user terminal may use the video frame corresponding to the trigger operation as the key video frame.
  • the target user may select object A as the target object in the video playback interface 100a shown in FIG.
  • the target user terminal may call the video frame currently played by the video playback interface 100a as the key video frame.
  • the target user terminal may use the video frame corresponding to the selection operation (ie, trigger operation) as the key video frame, and may use the pixel corresponding to the selection operation in the key video frame as the target pixel.
  • the target pixel is a pixel in a key video frame in the target video acquired by the target user terminal.
  • the text box shown in FIG. 5 may pop up in the video playback interface 200a shown in FIG.
  • the text box can also be called a dialog box.
  • the text box shown in FIG. 5 can be understood as a floating window independent of the video playback interface 200a, and the text box shown in FIG. 5 may have an association relationship with the object A shown in FIG. 5 (for example, it may be associated with There is a relative relationship between the target pixels in the object A in display positions, so as to construct the correlation between the target pixels of the target object in the video 30a and the multimedia information associated with the target object).
  • the implementation of the floating window may be similar or the same as that of the video playing interface.
  • the multimedia information entered in the dialog box of the embodiment of the present application may include user text, user pictures, user expressions and other data, and the user text entered by the target user in the dialog box (ie text information ), user pictures (i.e. picture information) and user expressions (i.e. expression information) are collectively referred to as barrage data.
  • the display of the barrage data can be similar to subtitles.
  • the input text information A can be displayed in the video playback interface 300a shown in FIG.
  • the input text information A may be the text information shown in FIG. 5 with a certain distance between the target pixels in the object A.
  • the text information A displayed in the video playing interface 300a can be referred to as barrage data associated with the target object.
  • Step S102 Determine a trajectory acquisition request corresponding to the target pixel based on the position information of the target pixel in the key video frame.
  • the target user terminal may determine the position information of the target pixel in the key video frame, and may be based on the frame number of the key video frame in the target video and the position information of the target pixel in the key video frame , Generate a trajectory acquisition request corresponding to the target pixel, so that step S103 can be further executed.
  • the trajectory acquisition request may be used to instruct the application server to filter the trajectory information that matches the target pixel from the pre-calculated trajectory information corresponding to all pixels in the target video.
  • Step S103 based on the trajectory acquisition request, acquire target trajectory information associated with the position information of the target pixel in the key video frame.
  • the target track information includes the position information of the target pixel in the next video frame of the key video frame, and the position information of the target pixel in the next video frame of the key video frame is Obtained by tracking the target pixel.
  • the target user terminal may be based on the motion trajectory of all pixels in the target video in all video frames calculated in advance by the application server (the motion trajectory of each pixel may be collectively referred to as a track information), from Among the motion trajectories corresponding to these pixels, the motion trajectories of the pixel points matching the target pixel are selected as target trajectory information associated with the position information of the target pixel in the key video frame.
  • the target user terminal obtains the target track information, it can quickly determine the position of the target pixel contained in the target track information in the next video frame of the key video frame.
  • the multimedia information appears in the Position information in the next video frame of the key video frame.
  • the position separation distance can be understood as the relative position separation distance between the target pixel in the key video frame and the corresponding barrage data. That is, the position separation distance can include the relative position separation distance in the horizontal (ie, horizontal) direction, or the relative position separation distance in the vertical (ie, longitudinal) direction, so as to ensure that the target user terminal can obtain the target pixel in the key video
  • the position information of the text information A in the next video frame of the key video frame can be quickly calculated based on the relative position separation distance. That is, at this time, the position information of the text information A displayed in the video playback interface 300a in the embodiment corresponding to FIG.
  • the target user terminal can obtain a track that matches the position information of the target pixel in the key video frame from an application server that has a network connection relationship with the target user terminal Information, as target trajectory information, so that when the target user terminal obtains the target trajectory information of the target pixel pre-calculated by the application server, it is further based on that the target pixel in the target trajectory information appears in the key video
  • the position information of the next video frame of the frame can quickly and accurately realize the fast tracking of the barrage data within the effective time, so that the calculation amount of the target user terminal can be effectively reduced to ensure the calculation of the target user terminal In the case of relatively general performance, you can also quickly track the barrage data.
  • the effective duration may be the display duration corresponding to the barrage data, that is, the target user terminal can track the barrage data associated with the target object within the display duration.
  • the motion track of each pixel in the target video (that is, the track information of each pixel) is determined by the position information of each pixel in each video frame of the target video.
  • the embodiment of the present application may determine any two adjacent video frames among the multiple video frames as an image pair. It should be understood that one of the two video frames contained in each image pair determined from the plurality of video frames may be called the first video frame, and the other video frame may be called the second video frame. Video frame. For the image pair 1 formed by the video frame corresponding to the first moment and the video frame corresponding to the second moment in the embodiment corresponding to FIG.
  • the image pair 1 corresponding to the first moment The video frame is referred to as the first video frame, and the video frame corresponding to the second moment can be referred to as the second video frame, which can then be based on the pre-calculated distance between the two video frames in the image pair 1.
  • the video corresponding to the second moment may also be The frame is called the first video frame, and the video frame corresponding to the third moment is called the second video frame, which can be based on the pre-calculated average between the two video frames in the image pair 2.
  • the displacement matrix tracks all the pixels in the first video frame to determine the position information where all the pixels in the first video frame appear in the second video frame.
  • the embodiment of the present application can obtain the average displacement matrix corresponding to each image pair, and the average displacement matrix corresponding to each image pair can be called the average displacement matrix corresponding to the first video frame in each image pair.
  • the average displacement matrix corresponding to the first video frame can be used to map all the pixels in the first video frame to the second video frame, so as to accurately obtain the position information of the mapped pixels in the second video frame.
  • the average displacement matrix in the embodiment of the present application may include a longitudinal average displacement matrix and a horizontal average displacement matrix.
  • the first longitudinal coordinate value (for example, the y value) of each pixel in the first video frame can be transformed into the longitudinal coordinate to obtain the corresponding pixel mapped to the second longitudinal in the second video frame Coordinates; in the same way, the first horizontal coordinate value (for example, the x value) of each pixel in the first video frame can be transformed by the horizontal average displacement matrix to get the corresponding pixel to be mapped to the second video frame
  • the second horizontal coordinate in.
  • the first horizontal coordinate and the first vertical coordinate value of each pixel in the first video frame may be referred to as the first position information of each pixel in the first video frame.
  • the second horizontal coordinate and the second vertical coordinate value of each pixel point mapped in the first video frame can be referred to as the second position information of each pixel point obtained by mapping in the second video frame. Since each image pair corresponds to an average displacement matrix, the corresponding second position information can be calculated based on the first position information of the pixel in the first video frame, and each second video frame obtained by the calculation can be calculated The second position information of the pixel points mapped in the video frame is retained, and the position information of the same pixel point in each video frame can be integrated to obtain the motion trajectory of all pixels in the video frame, which can be realized Track all pixels in all video frames of the target video.
  • the multiple videos in the target video shown in the embodiment corresponding to FIG. 2 may be multiple consecutive image frames. Therefore, by splitting the target video shown in FIG. Set the corresponding video frame numbers of the image frames (ie video frames) obtained by splitting according to the playback order.
  • the video frame number of the video frame obtained at the first moment can be 1, and the video frame number 1 can be used Yu means that the video frame obtained at the first moment is the first frame in the target video; similarly, the video frame number of the video frame obtained at the second moment can be 2, and the video frame number 2 can be used for It means that the video frame obtained at the second moment is the second frame in the target video.
  • the video frame number of the video frame obtained at time n-1 can be n-1, and the video frame number n-1 can be used to indicate that the video frame obtained at time n-1 is the target
  • the n-1th frame in the video; the video frame number of the video frame obtained at the nth time can be n, and the video frame number n can be used to indicate that the video frame obtained at the nth time is the target video
  • the nth frame is the last frame in the target video.
  • the embodiment of the present application may refer to the image pair formed by the first frame and the second frame of the multiple video frames shown in FIG. 2 as the first image pair to illustrate that the average displacement matrix is used to divide the image in the first frame
  • the pixel points are translated into the second frame to realize the specific process of pixel tracking.
  • the first frame in the first image pair is the video frame corresponding to the first moment in the embodiment corresponding to FIG. 2
  • the second frame in the first image pair is the embodiment corresponding to FIG. 2.
  • FIG. 6, is a schematic diagram of a full-image pixel tracking provided by an embodiment of the present application.
  • the image pair (1, 2) shown in FIG. 6 may be the first image pair described above.
  • the first video frame in the first image pair may be the video frame corresponding to the aforementioned first moment (ie, the first frame), and the second video frame in the first image pair may be the aforementioned video frame corresponding to the second moment ( That is the second frame).
  • the value 1 in the image pair (1, 2) is the video frame number of the first frame
  • the value 2 is the video frame number of the second frame. Therefore, the video frame number of each video frame in the target video can be used to characterize any two consecutively adjacent video frames in the target video.
  • the pixel point display area 600a shown in FIG. 6 may include all the pixels extracted from the first video frame of the image pair.
  • each pixel point in the pixel point display area 600a may correspond to an area. logo.
  • the pixel point display area 600a in FIG. 6 is only for example, and the pixel point display area 600a may also be referred to as a pixel point area or the like. It should be understood that the embodiment of the present application only takes 20 pixels of pixels obtained from the first video frame as an example. In actual situations, the number of pixels obtained from the first video frame There will be far more than the 20 listed in the examples of this application. It should be understood that since multiple video frames in the same video are obtained by the same terminal after image collection, the number of pixels in each video frame included in the same video is the same.
  • the average displacement matrix shown in Figure 6 can be used for the pixel.
  • All pixels in the dot display area 600a are tracked, and the position information of the mapped pixels can be determined in the pixel dot display area 700a corresponding to the second video frame.
  • the position information of the pixel point A in the pixel point display area 600a shown in FIG. 6 may be the coordinate position information of the area identifier 5, and the average displacement matrix can be used to The pixel point A is mapped to the pixel point display area 700 a shown in FIG.
  • the position information of the pixel point A in the pixel point display area 700 a shown in FIG. 6 may be the coordinate position information of the area identifier 10.
  • the position information of the pixel point A in the second video frame after the position information of the pixel point A in the second video frame is obtained by calculation, it may be stored. Since each image pair in the target video can correspond to an average displacement matrix, the position information of each pixel in the first video frame mapped to the second video frame can be calculated. By integrating the position information of the same pixel in each image pair in consecutive video frames, the position information of the pixel A in each video frame in the target video can be obtained, which can then be based on the pixel The position information of A in each video frame in the target video obtains the motion track of the pixel A.
  • the average displacement matrix corresponding to each image pair (ie, the average displacement matrix corresponding to the first video frame in each image pair) .
  • the trajectory information corresponding to all pixels in the target video may be collectively referred to as trajectory information corresponding to the pixels in the embodiment of the present application.
  • the target user terminal can be pre-calculated through an application server that has a network connection relationship with the target user terminal.
  • the motion track of all pixels in the target video Therefore, when the target user terminal actually plays the target video, the application server receives the position information of the target pixel in the key video frame sent by the target user terminal, and filters the track information corresponding to the pixel from the pre-calculated track information.
  • the trajectory information matched by the target pixel is used as the target trajectory information, and the target trajectory information is further returned to the target user terminal, so that the target user terminal can further execute step S104 based on the acquired target trajectory information.
  • the target pixel is a pixel in the key video frame selected by the target user.
  • the motion trajectories of all pixels in the target video can be pre-calculated in the target user terminal, so that the target video can be actually played on the target user terminal
  • the trajectory information matching the target pixel is selected from the trajectory information corresponding to these pixel points as the target trajectory information, so that step S104 can be further executed.
  • Step S104 when the next video frame of the key video frame is played, based on the position information of the target pixel in the target track information in the next video frame of the key video frame, display the multimedia information.
  • FIG. 7 is a schematic diagram of tracking barrage data in consecutive multiple video frames provided by an embodiment of the present application.
  • the multiple consecutive video frames used for barrage tracking in the embodiment of the present application may include the key video frame currently being played and the video frame located after the key video frame in the target video that has not yet been played.
  • the video frame 10 as shown in FIG. 7 is used as a key video frame, in each video frame following the key video frame (for example, video frame 20, video frame 30, etc.), the pair appears in the video
  • the barrage data in frame 10 is used for barrage tracking.
  • the video frame 10 shown in FIG. 7 may be the video frame displayed in the video playback interface 300a in the embodiment corresponding to FIG.
  • the video frame 10 shown in FIG. 7 may be currently being played in the target user terminal.
  • the key video frame in the embodiment of the present application can be understood as the video frame corresponding to the trigger operation performed when the target user selects the target object.
  • the target objects in the embodiments of the present application may include objects such as characters, animals, plants, etc., selected by the target user through a click operation in the video frame being played.
  • the target user terminal may call the object selected by the target user the target object, and may use the pixel point corresponding to the trigger operation in the target object in the key video frame as the target pixel point, and then may pre-determine it from the application server.
  • the target track information may include the position information of the target pixel in the key video frame, and may also include each video frame after the key video frame (for example, the next video of the key video frame). Frame). It should be understood that based on the position information of the target pixel in each video frame after the key video frame, the multimedia information associated with the target object (also associated with the target pixel) can be quickly calculated (that is, as shown in Figure 5 above).
  • the position information in each video frame after the key video frame to achieve fast tracking of the barrage data associated with the target object, so that the target user terminal can play the In the next video frame of the key video frame, based on the calculated position information of the barrage data in the key video frame, the barrage data is displayed in the next video frame in real time.
  • the display of the barrage data can be similar to the display of subtitles.
  • the barrage data and the target object can be accompanied by a shadow, that is, the barrage input by the user can be effectively tracked.
  • follow the target object to be tracked for relative movement For example, in the target video, if there are target objects in multiple consecutive video frames after the key video frame, the position information of the target pixels in the target object in these consecutive video frames can be displayed. Barrage data associated with the target object (that is, the aforementioned text information A).
  • the target user terminal may also transmit the barrage data (multimedia information) input by the user and the calculated position information of the barrage data in each video frame of the target video to the server.
  • the server may receive the frame number of the key video frame in the target video clicked by the user from the target user terminal, the coordinates of the target pixel, and the input barrage data (multimedia information), and calculate the position of the target pixel in each video frame of the target video.
  • Target trajectory information according to the target trajectory information, calculate the position information of the barrage data in each video frame of the target video, and save the position information of the barrage data.
  • the server may also receive information such as the identification of the target user terminal and/or the user identification of the user logging in to the target application on the target user terminal. Then when other user terminals play the target video, the server can send the barrage data, its position information in each video frame of the target video, and the user ID to other user terminals, and the other user terminals will use the barrage data according to the barrage data. Position information, display barrage data in each video frame of the target video.
  • the target user can select the object that the target user thinks needs to be tracked from the currently played video frame when the current time is T1.
  • the selected object can be referred to as a target object.
  • the target user terminal can filter out the trajectory information associated with the target pixel in the target object based on the pre-calculated trajectory information corresponding to all the pixels in the video, and quickly obtain the target pixel in the target object.
  • the target track information corresponding to the point It should be understood that the pre-calculated track information corresponding to any pixel of each pixel in the video can be used to describe the position information of the pixel in each video frame of the video.
  • the target user terminal uses the video frame played at time T1 as the key video frame, the target pixel in the target object can be obtained in the key video frame, and then the target track information corresponding to the target pixel can be obtained , To quickly obtain the position information of the target pixel in each video frame after the key video frame, so that the multimedia information associated with the target object can be displayed based on the target track information. It can be understood that, if the trajectory information formed by the target pixel in each video frame is a circle, the multimedia information associated with the target object can synchronously follow the trajectory information in a circle.
  • the trajectory information corresponding to each pixel can be obtained in advance, so that when the target video is played in the target user terminal, it can be based on the trigger executed by the target user Operation, the pixel in the target object corresponding to the trigger operation is used as the target pixel to obtain the trajectory information associated with the target pixel as the target trajectory information, which can then be quickly realized based on the acquired target trajectory information Accurate tracking of the multimedia information associated with the target object.
  • the corresponding motion trajectories of the target pixels in the different objects can be obtained, so that the barrage data associated with different target objects can move around different trajectories , So that the correlation between the barrage data and the object it is aimed at is stronger, thereby enriching the visual display effect of the barrage data, and improving the flexibility of the display mode of the barrage data.
  • the embodiment of the present application may use the video frame corresponding to the trigger operation in the target video as the key video frame when the trigger operation of the target user on the target video is acquired, so that the target pixel can be determined from the key video frame and obtained Multimedia information associated with the target pixel and the target object where the target pixel is located (for example, the multimedia information may be barrage data such as user text, pictures, expressions, etc. in the target video). Further, based on the position information of the target pixel in the key video frame, the trajectory acquisition request corresponding to the target pixel is determined, and then based on the trajectory acquisition request, the target pixel in the key video frame can be acquired.
  • the target track information associated with the position information in the key video frame so that when the next video frame of the key video frame is played, based on the target track information, the target pixel and the target object associated with the target pixel can be displayed Barrage data.
  • the embodiment of the present application can further filter out the trajectory information of the target pixel from the trajectory information of all pixels in the key video frame, and determine the value of the selected target pixel.
  • the trajectory information is used as the target trajectory information, so that the display effect of the barrage data can be enriched based on the obtained target trajectory information. For example, for target pixels in different target objects, the obtained target trajectory information may be different, which in turn makes the display effect of the barrage data different.
  • the position information of the barrage data in each video frame after the key video frame can be quickly determined.
  • the barrage data will be displayed in the target video.
  • the target object is always changed in the video, which can enrich the visual display effect of the user text in the video, and can make the barrage data more closely related to the target object or the object in the commented video.
  • FIG. 8 is a schematic diagram of another video data processing method according to an embodiment of the present application. This method is mainly used to illustrate the data interaction process between the target user terminal and the application server. The method may include the following steps:
  • Step S201 Acquire adjacent first video frames and second video frames from the target video.
  • the application server may determine multiple image pairs from multiple video frames contained in the target video, and each image pair in the multiple image pairs is composed of two adjacent video frames in the target video. Constituted.
  • the target video when the application server performs video preprocessing on the target video, the target video can be divided into frames first, so that multiple video frames in the target video are divided into pictures according to the playback time sequence. That is, multiple video frames arranged based on the playback time sequence shown in FIG. 2 can be obtained.
  • a picture corresponding to each video frame can be obtained, that is, one image can be regarded as one image frame.
  • the application server can use the forward-backward optical flow method to track the pixels in the two video frames in each image pair. For example, for a target video containing n video frames, the application server may determine two video frames with adjacent frame numbers as an image pair according to the video frame number of each video frame in the target video.
  • the application server may determine a video frame with a video frame number of 1 and a video frame with a video frame number of 2 as an image pair. Similarly, the application server may determine the video frame with the video frame number of 2 and the video frame with the video frame number of 3 as an image pair. By analogy, the application server may determine a video frame with a video frame number of n-1 and a video frame with a video frame number of n as an image pair.
  • n-1 image pairs can be obtained.
  • n-1 image pairs can be expressed as: (1, 2), (2, 3), (3, 4),...,(n-1,n).
  • the video frame with the video frame number of 1 in the image pair can be called the first frame of the target video
  • the video frame with the video frame number of 2 can be called the second frame of the target video, and so on.
  • the video frame with the video frame number of n-1 in the image pair can be called the n-1th frame of the target video, and the video frame with the video frame number of n can be called the nth frame of the target video.
  • the application server can track the pixel points in each image pair of the target video through the cloud forward and backward optical flow method.
  • the cloud forward and backward optical flow method can be collectively referred to as the optical flow method, and the optical flow method can be used to calculate the pixel point displacement between two video frames in each image pair.
  • each image pair is composed of two adjacent video frames
  • one video frame in each image pair can be called the first video frame
  • each The other video frame in the image pair is called the second video frame
  • the embodiment of the present application may collectively refer to the two video frames in each image pair acquired from the target video as the first video frame and the second video frame, that is, the application server may obtain information from the target video. Acquire adjacent first video frames and second video frames from the video.
  • Step S202 Determine an average displacement matrix corresponding to the first video frame based on the optical flow tracking rule corresponding to the target video, the pixels in the first video frame, and the pixels in the second video frame.
  • the application server may extract all pixels of the first video frame. All the extracted pixels can be collectively referred to as pixels.
  • the optical flow tracking rule corresponding to the target video may include the aforementioned cloud forward and backward optical flow method, and may also include the cloud displacement integration method and the cloud displacement difference method. It should be understood that through this optical flow tracking rule, the pixels in the first video frame and the pixels in the second video frame in each image pair can be subjected to optical flow calculations to obtain the optical flow tracking corresponding to each image pair. As a result, the target state matrix and target displacement matrix corresponding to each image pair can be determined based on the optical flow tracking result.
  • the application server may select a block around the pixel (including the pixel and the pixels around the pixel) with respect to each pixel in the first video frame, and calculate the The average displacement of all pixels in the block is used as the displacement of the pixel.
  • the computational complexity of this processing method may be relatively large.
  • the optical flow tracing rule can further perform displacement integration operations on the target state matrix and the target displacement matrix corresponding to each image pair to obtain the state integral matrix and displacement integral matrix corresponding to each image pair. Further, through the optical flow tracing rule, the displacement difference operation can be performed on the state integral matrix and the displacement integral matrix corresponding to each image pair to obtain the average displacement matrix corresponding to each image pair. In other words, the optical flow tracking rule can accurately obtain an average displacement matrix that can be used to accurately track the position information of the pixel points in the first video frame in each image pair.
  • the application server can calculate the average displacement of the pixels in the first video frame and the pixels in the second video frame in batches, thereby increasing the speed of calculation and improving the resolution of pixels and video frames. Processing efficiency.
  • the cloud forward and backward optical flow method can be used to synchronously perform forward and reverse optical flow calculations on the first video frame and the second video frame in each image pair to obtain the optical flow corresponding to each image pair Track the results.
  • the optical flow tracking result obtained by the application server may include the forward displacement matrix corresponding to the first video frame in each image pair, and may also include the reverse displacement matrix corresponding to the second video frame in each image pair.
  • each matrix element in the forward displacement matrix and the reverse displacement matrix may include a displacement in two dimensions (for example, ( ⁇ x, ⁇ y)).
  • the displacement in these two dimensions can be understood as the displacement of the same pixel in the horizontal direction (ie, ⁇ x) and the displacement in the vertical direction (ie, ⁇ y). It should be understood that for each image pair in the target video, after calculation by the optical flow method, a positive horizontal displacement matrix, a positive vertical displacement matrix, and a reverse horizontal displacement can be obtained. Matrix, a reverse vertical displacement matrix, and the four matrices obtained can be called optical flow results. Further, the application server can set an initial state matrix for the first video frame in each image pair, and then can determine the first video in each image pair based on the forward displacement matrix and the reverse displacement matrix obtained above Whether the pixels in the frame meet the target filtering conditions.
  • the application server can determine the pixels that meet the target screening conditions as effective pixels, and then can perform a calculation on the first video frame according to the determined effective pixels.
  • the corresponding initial state matrix and the forward displacement matrix are modified to obtain the target state matrix and the target displacement matrix corresponding to the first video frame in each image pair.
  • the application server can determine and obtain the average displacement matrix corresponding to the first video frame in each image pair through the cloud displacement integration method and the cloud displacement difference method, as well as the obtained target state matrix and target displacement matrix.
  • the forward horizontal displacement matrix and the forward vertical displacement matrix may be collectively referred to as the forward displacement matrix
  • the reverse horizontal displacement matrix and the reverse vertical displacement matrix may be collectively referred to as the reverse displacement matrix.
  • the embodiment of the present application takes an image pair of multiple image pairs as an example to illustrate the process of obtaining the average displacement matrix corresponding to the image pair through the first video frame and the second video frame in the image pair.
  • the first video frame in the image pair may be a video frame with a video frame number of 1
  • the second video frame may be a video frame with a video frame number of 2. Therefore, an image pair composed of a video frame with a video frame number of 1 and a video frame with a video frame number of 2 is called image pair 1, and the image pair 1 can be expressed as (1,2).
  • the positive displacement matrix corresponding to the image pair 1 obtained after calculation by the optical flow method may include a positive horizontal displacement matrix (for example, the positive horizontal displacement matrix may be a matrix Q 1,2,x ) and a positive vertical Displacement matrix (for example, the positive vertical displacement matrix may be matrix Q 1,2,y ).
  • each matrix element in the matrix Q 1, 2, x can be understood as the horizontal displacement of the pixel in the first video frame in the second video frame. That is, each matrix element in the forward horizontal displacement matrix can be referred to as the first lateral displacement corresponding to the pixel in the first video frame.
  • each matrix element in the matrix Q 1, 2, y can be understood as the vertical displacement of the pixel in the first video frame in the second video frame.
  • each matrix element in the forward horizontal displacement matrix can be referred to as the first longitudinal displacement corresponding to the pixel in the first video frame.
  • the matrix size of the two matrices (ie matrix Q 1,2,x and matrix Q 1,2,x ) obtained by the optical flow calculation method is the same as the size of the first video frame, that is, one matrix element can correspond to A pixel in the first video frame.
  • the inverse displacement matrix corresponding to the image pair 1 obtained by the optical flow method can include the inverse horizontal displacement matrix (that is, the inverse horizontal displacement matrix can be the matrix Q 2,1,x ) and the inverse displacement matrix .
  • a matrix of vertical displacement (that is, the matrix of reverse vertical displacement may be a matrix Q 2,1,y ).
  • each matrix element in the matrix Q 2, 1, x can be understood as the horizontal displacement of the pixel in the second video frame in the first video frame. That is, each matrix element in the reverse horizontal displacement matrix can be referred to as the second lateral displacement corresponding to the pixel in the second video frame.
  • each matrix element in the matrix Q 2, 1, y can be understood as the vertical displacement of the pixel in the second video frame in the first video frame. That is, each matrix element in the reverse vertical displacement matrix can be referred to as the second longitudinal displacement corresponding to the pixel in the second video frame.
  • the matrix size of the two matrices (ie matrix Q 2,1,x and matrix Q 2,1,y ) obtained by the optical flow calculation method is the same as the size of the second video frame, that is, one matrix element can correspond to A pixel in the second video frame.
  • the matrix size is the same. For example, if the number of pixels in each video frame is m ⁇ n, the matrix size of the obtained four matrices can all be m ⁇ n. It can be seen that each matrix element in the forward horizontal displacement matrix and the forward vertical displacement matrix can correspond to the corresponding pixel in the first video frame.
  • each matrix element in the forward displacement matrix corresponding to the image pair 1 can represent the displacement of the pixel in the first video frame in two dimensions in the second video frame.
  • the forward displacement matrix corresponding to the image pair 1 may be collectively referred to as the forward displacement matrix corresponding to the first video frame.
  • each matrix element in the reverse displacement matrix corresponding to image pair 1 may represent the displacement of the pixel in the second video frame in two dimensions in the first video frame.
  • the inverse displacement matrix corresponding to the image pair 1 may be collectively referred to as the inverse displacement matrix corresponding to the second video frame.
  • the application server can forward the pixels in the first video frame to the second Video frame, and the second position information of the first mapping point obtained by mapping is determined in the second video frame, and may be further based on the first position information of the pixel point and the second position information of the first mapping point The position information determines the forward displacement matrix corresponding to the first video frame.
  • the application server may reversely map the pixels in the second video frame to the first video based on the second position information of the pixels in the second video frame and the optical flow tracking rule Frame, and determine in the first video frame the third location information of the second mapping point obtained by mapping, and further based on the second location information of the first mapping point and the third location of the second mapping point Information to determine the reverse displacement matrix corresponding to the second video frame.
  • the first mapping point and the second mapping point are both pixel points obtained by mapping a pixel point in one video frame of the image pair to another video frame by an optical flow method.
  • the application server may, based on the first position information of the pixels in the first video frame, the forward displacement matrix, and the reverse displacement matrix, divide the pixels that meet the target filtering condition among the pixels Determined as effective pixels.
  • the specific process of determining effective pixels by the application server can be described as:
  • the application server may obtain the first pixel from the pixels in the first video frame, and determine the first position information of the first pixel in the first video frame, and shift it from the forward direction
  • the first horizontal displacement and the first longitudinal displacement corresponding to the first pixel are determined in the matrix; further, the application server may be based on the first position information of the first pixel, and the first pixel corresponding to the first pixel.
  • the application server may determine the second lateral displacement and the second longitudinal displacement corresponding to the second pixel from the reverse displacement matrix, and based on the second position information of the second pixel, the second The second horizontal displacement and the second longitudinal displacement corresponding to the two pixels are reversely mapped to the first video frame, and the mapped third pixel is determined in the first video frame Point’s third position information; further, the application server may determine the first pixel point and the third pixel point based on the first position information of the first pixel point and the third position information of the third pixel point.
  • the error distance between the pixels and according to the first position information of the first pixel and the second position information of the second pixel, determine the image block containing the first pixel and the image block containing the second pixel Correlation coefficients between image blocks of points; further, the application server may determine the pixel points in the pixel points whose error distance is less than the error distance threshold and the correlation coefficient is greater than the correlation coefficient threshold as effective pixels.
  • the embodiment of the present application may perform matrix transformation on the matrix elements in the four displacement matrices. Screening, that is, through the changes of the matrix elements at the positions of the corresponding pixels in the constructed initial state matrix, the matrix elements with large displacement errors at the corresponding positions of the corresponding pixels can be removed from these four matrices, That is, the effective pixels can be determined from the pixels of the first video frame.
  • FIG. 9 is a method for determining effective pixels provided by an embodiment of the present application.
  • the application server may first initialize a state matrix S 1 with the same size as the first video before filtering the matrix elements in the four matrices.
  • the application server may call the state matrix S 1 an initial state matrix.
  • the value of the matrix element corresponding to each pixel point may be referred to as the first value.
  • the first value in the initial state matrix is all zero.
  • the change of the value of the matrix element in the initial state matrix can be used to indicate whether the pixel in the first video frame meets the target filter condition, so that the pixel that meets the target filter condition can be used as the effective tracking pixel (that is, the effective pixel point).
  • the first image frame shown in FIG. 9 may be a video frame whose video frame number is 1 in the aforementioned image pair 1.
  • the pixels in the first video frame may include the first pixel p1 as shown in FIG. 9, that is, the first pixel p1 may be one of all the pixels in the first video frame, and may
  • the position information of the first pixel p1 in the first video frame is called first position information.
  • the application server can find the first lateral displacement corresponding to the first pixel p1 from the positive horizontal displacement matrix in the forward displacement matrix, and find the positive vertical displacement matrix in the forward displacement matrix.
  • the first longitudinal displacement corresponding to the first pixel point p1 may further be based on the first position information of the first pixel point p1, the first lateral displacement and the first longitudinal displacement corresponding to the first pixel point p1.
  • the pixel point p1 is forwardly mapped to the second video frame shown in FIG. 9, and the second position information of the second pixel point p2 obtained by mapping is determined in the second video frame. It can be understood that, at this time, the second pixel point p2 is a pixel point obtained by matrix transformation of the first pixel point p1.
  • the application server may determine the second lateral displacement and the second longitudinal displacement corresponding to the second pixel point p2 from the above-mentioned reverse displacement matrix, and may determine the second position information of the second pixel point p2 and the second position
  • the second horizontal displacement and the second longitudinal displacement corresponding to the two pixel points p2, the second pixel point p2 is reversely mapped back to the first video frame shown in FIG. 9, and the mapped result can be determined in the first video frame
  • the third position information of the third pixel p1' It can be understood that, at this time, the third pixel point p1' is a pixel point obtained by matrix transformation of the second pixel point p2 obtained by mapping the first pixel point p1.
  • the application server may determine, in the first video frame, between the first position information of the first pixel p1 and the third position information of the third pixel p1' obtained after matrix transformation. The position error of t 11' . Further, the application server may select an image block 10 with a size of k*k pixels (for example, 8*8 pixels) in the first video frame shown in FIG. 9 with the first position information of the first pixel p1 as the center. . In addition, as shown in FIG. 9, the application server can also select an image block 20 with a size of k*k pixels in the second video frame shown in FIG. 9 with the second position information of the second pixel p2 as the center, and then The correlation coefficient between the two image blocks can be calculated (the correlation coefficient can be N 1,2 ).
  • Patch 1 (a, b) in the formula (1) can represent the pixel value of the pixel at the position of the ath row and the b column of the image block 10 shown in FIG. 9.
  • the pixel value may be the gray value of the pixel, which is between 0 and 255.
  • E(patch 1 ) represents the average pixel value of the tile 10 shown in FIG. 9.
  • patch 2 (a, b) represents the pixel value of the pixel at the position of the ath row and b column of the image block 20 shown in FIG. 9.
  • E(patch 2 ) represents the average pixel value of the image block 20 shown in FIG. 9.
  • the application server may correspond to the first pixel p1 initial state matrix S matrix element at the corresponding position of a second value.
  • the value of the initial state matrix S 1 and the point p1 corresponding to the first pixel element is switched from 0 to 1, to indicate that the first video frame in the first effective pixel as a pixel p1.
  • the application server can determine that the first pixel p1 shown in FIG. 9 is an invalid tracking pixel. I.e.
  • the application server may further determine the value of the matrix element at the position corresponding to the first pixel point p1 in the above-mentioned forward displacement matrix (ie, the above-mentioned matrix Q 1,2,x and matrix Q 1,2,y ) Set to 0, so that the forward displacement matrix containing the first value can be determined as the target displacement matrix (for example, the forward horizontal displacement matrix Q x1 and the forward horizontal displacement matrix Q y1 ). That is, the matrix elements at these positions in the target displacement matrix can be used to represent the matrix determined after filtering out the above-mentioned forward displacement matrix and filtering out the mistracking displacement with larger error.
  • the pixels can be selected from the first video frame shown in FIG. 9 as the first pixel in order to repeat the above steps of determining effective pixels. Until all pixels in the first video frame are regarded as first pixels, all effective pixels in the first video frame can be determined. Thereby, the matrix element in the initial state matrix can be updated based on the position information of the effective pixel in the initial state matrix, and the initial state matrix containing the second value can be determined as the target corresponding to the first video frame State matrix S 1 . And the target displacement matrix corresponding to the first video frame (that is, the target horizontal displacement matrix Q x,1 and the target horizontal displacement matrix Q y,1 ) can be obtained.
  • the composed image pairs can be expressed as (1, 2) , (2, 3), (3, 4), ..., (n-1, n).
  • the target state matrix S 1 corresponding to the image pair (1, 2) and the image pair (1, 2) corresponding to the image pair (1, 2) can be finally obtained through the above-mentioned effective pixel point judgment method
  • the target displacement matrix Q 1 (that is, the aforementioned target horizontal displacement matrix Q x,1 and the target vertical displacement matrix Q y,1 ).
  • the target state matrix S n-1 corresponding to the image pair (n-1, n) and the target displacement matrix Q n-1 corresponding to the image pair (1, 2) can be obtained (that is, the aforementioned target horizontal displacement matrix Q x, n-1 and the target horizontal displacement matrix Q y, n-1 ).
  • the application server can be the target of the image matrix corresponding to a state S 1 and the target displacement matrix Q by an integral operation Drive displacement integration, to obtain the corresponding pixels of the first video frame in a state of integration
  • the displacement integral matrix Q in (x, y) may include a lateral displacement integral matrix Q x,in (x,y) and a longitudinal displacement integral matrix Q y,in (x,y).
  • the state integral matrix S in (x, y), the lateral displacement integral matrix Q x, in (x, y) and the longitudinal displacement integral matrix Q y, in (x, y) can be obtained by the following matrix integral formula:
  • the x and y in formula (2), formula (3), and formula (4) can be used to represent the coordinates of all matrix elements in the state integral matrix and displacement integral matrix corresponding to the first video frame, such as S in ( x, y) can represent the value of the matrix element in the xth row and yth column of the state integral matrix.
  • x'and y'in formula (2), formula (3), and formula (4) can represent the coordinates of matrix elements in the target state matrix and the target displacement matrix, such as S(x', y') Represents the value of the matrix element in the xth'row and y'column of the target state matrix.
  • the application server can select a target frame with a height of M and a width of N in the first video frame through the cloud displacement difference method, and then can further compare formula (2) and formula (3) in the target frame ,
  • the three integral matrices obtained by formula (4) are subjected to displacement difference operation to obtain the state difference matrix S dif (x, y) and the displacement difference matrix Q dif (x, y) respectively.
  • the target frame is to select all the pixels in a certain area around the pixels to calculate the average displacement. For example, the size is 80 ⁇ 80 pixels.
  • the displacement difference matrix Q dif (x, y) may include a lateral displacement difference matrix Q x, dif (x, y) and a longitudinal displacement integral matrix Q y, dif (x, y).
  • the state integral matrix S dif (x, y), the lateral displacement difference matrix Q x, dif (x, y) and the longitudinal displacement integral matrix Q y, dif (x, y) can be obtained by the following matrix difference formula (5).
  • the area where the target frame is located in the first video frame may be referred to as a differential area, which can be based on the size information of the differential area, the state integral matrix, the lateral displacement integral matrix and The longitudinal displacement integral matrix determines the average displacement matrix corresponding to the first video frame.
  • M and N in the displacement difference calculation formula are the length and width values of the difference area.
  • x and y in the displacement difference calculation formula are respectively the position information of each pixel in the first video frame.
  • the matrix in terms of the state of integration can be obtained integrating the state matrix S in (x, y) corresponding to a state difference matrix S dif (x, y).
  • the lateral displacement integral matrix Q x,in (x,y) and the longitudinal displacement integral matrix Q y,in (x,y) can be obtained Difference matrix Q y, dif (x, y).
  • the application server may determine the ratio between the lateral displacement difference matrix Q x, dif (x, y) and the state difference matrix S dif (x, y) as the horizontal average displacement matrix Q x, F (x , Y), and determine the ratio between the longitudinal displacement difference matrix Q y, in (x, y) and the state difference matrix S dif (x, y) as the longitudinal average displacement matrix Q y, F (x ,Y).
  • the e in formula (6) and formula (7) is used to represent a relatively small number artificially set, such as 0.001. That is, the e in formula (6) and formula (7) is to avoid when the value of all matrix elements in the state difference matrix S dif (x, y) is 0, so as to avoid direct division by 0, and further Step S203 is executed to pre-calculate the position information of the pixels in the first video frame in the second video frame in the target user terminal.
  • Step S203 based on the average displacement matrix, track the position information of the pixels in the first video frame, and determine the position information of the pixels obtained by tracking in the second video frame.
  • the application server may be based on the average displacement matrix obtained in step S203 (the average displacement matrix may include the horizontal average displacement matrix Q x, F (x, y) and the longitudinal average displacement matrix Q y, F (x, y) ), and further quickly and accurately track the position information of the pixels in the first video frame that appear in the next video frame (that is, the second video frame in the above image pair 1), that is, by performing displacement transformation,
  • the position information of the pixels obtained by tracking the pixels in the first video frame is determined in the second video frame.
  • x in formula (8) is the horizontal position coordinate of the pixel in the first video frame
  • Q x, F (x, y) is the horizontal average displacement matrix corresponding to the first video frame
  • the formula ( 8) The horizontal position coordinates of the pixels in the first video frame can be coordinate transformed to obtain the horizontal position coordinates of the pixels in the first video frame in the next video frame.
  • y in formula (9) is the longitudinal position coordinate of the pixel in the first video frame
  • Q x, y (x, y) is the longitudinal average displacement matrix corresponding to the first video frame, through the formula (9)
  • the longitudinal position coordinates of the pixels in the first video frame can be coordinate transformed to obtain the longitudinal position coordinates of the pixels in the first video frame in the next video frame.
  • the pixels in the first video frame in the corresponding image pair can be quickly tracked.
  • the position coordinates of the tracked pixel can be determined in the second video frame of the corresponding image pair, that is, the position information of the tracked pixel can be determined in the second video frame of each image pair.
  • the application server may further store the position information of the pixel points tracked in each image pair, so that step S204 may be further executed.
  • Step S204 Generate track information associated with the target video based on the position information of the pixels in the first video frame and the position information of the pixels obtained by the tracking in the second video frame.
  • the track information includes target track information used to track and display multimedia information associated with the target object in the target video.
  • Step S205 In response to a trigger operation on the target video, a target pixel is determined from a key video frame of the target video, and multimedia information associated with the target pixel is acquired.
  • Step S206 Determine a track acquisition request corresponding to the target pixel based on the position information of the target pixel in the key video frame.
  • step S205 and step S206 For the specific implementation of step S205 and step S206, reference may be made to the description of the target user terminal in the embodiment corresponding to FIG. 4, which will not be repeated here.
  • Step S207 In response to the request for acquiring the track of the target pixel in the key video frame, acquiring track information associated with the target video.
  • the application server may receive the trajectory acquisition request sent by the target user terminal based on the target pixel in the key video frame, and may further acquire the trajectory associated with all the pixels in the target video calculated in advance by the application server. Information in order to further execute step S208.
  • Step S208 Filter the target trajectory information associated with the position information of the target pixel in the key video frame from the trajectory information associated with the target video, and return the target trajectory information.
  • the application server can obtain the video frame number of the key video frame in the target video and the position information of the target pixel in the key video frame from the trajectory acquisition request, so that it can further obtain the data from the application server in advance.
  • the trajectory information associated with the target video is filtered out, and the trajectory information obtained by filtering can be called target trajectory information, so that the target trajectory information can be further returned to the target user terminal , So that the target user terminal can quickly find out the position information of the target pixel appearing in the next video of the key video frame from the received target track information based on the frame number of the key video frame, until the The position information of the target pixel appearing in each video frame after the key video frame.
  • the target user terminal can display the target pixel in each video frame after the key video frame.
  • New trajectory information formed it should be understood that when the application server obtains the frame number of the key video frame, it can quickly find out the location information of the target pixel in the next video of the key video frame from the filtered track information , Until the position information of the target pixel appearing in each video frame after the key video frame is obtained. At this time, the application server can appear the target pixel in each video frame after the key video frame.
  • the new track information formed by the position information is called target track information.
  • the target user terminal may send the trajectory acquisition request to the application server when generating the trajectory acquisition request corresponding to the target pixel. So that the application server can obtain target trajectory information associated with the position information of the target pixel in the key video frame based on the trajectory acquisition request, and can return the obtained target trajectory information to the target user terminal ;
  • the target user terminal may execute the above steps S201 to S204 in the target user terminal, so as to pre-set the target video in the target user terminal.
  • Perform full-image pixel tracking of all pixels in the target video to obtain the position information of all pixels in the target video in each video frame in advance, and then each pixel in the target video can be in each video frame
  • the position information is integrated to obtain the track information corresponding to each pixel in the target video.
  • the target user terminal can directly obtain the position information of the target pixel in the target object in the key video frame in the target user terminal.
  • Target trajectory information associated with the position information so that step S209 can be further executed.
  • the target track information includes the position information of the target pixel in the next video frame of the key video frame; the position information in the next video frame of the key video frame is obtained by tracking the target object owned.
  • the application server can pre-process each video frame in the target video in advance, that is, based on the above optical flow tracking rules, it can determine the composition of every two adjacent video frames in the target video.
  • the average displacement matrix corresponding to the image pair, and then the average displacement matrix corresponding to each image pair (also called the average displacement matrix corresponding to the first video frame in each image pair) for the first video frame All pixels are tracked to obtain the position information of all pixels in the first video frame in the second video frame, and then all the pixels in the target video can be obtained in each video frame (that is, the target).
  • the position information of all pixels of the video in the above-mentioned video frame a, video frame b, video frame c, video frame d, video frame e, and video frame f) can be based on the location information of all the pixels of the target video in each video From the position information in the frame, the track information corresponding to all pixels of the target video is obtained.
  • the track information corresponding to all the pixels of the target video is called track information associated with the target video.
  • the application server may pre-calculate the pixel point A in the target video (for example, the pixel point A may be among all the pixels in the target video). If the track information corresponding to the pixel point A includes the pixel point A in each video frame of the target video (that is, in the above-mentioned video frame a, video frame b, video frame c, Video frame d, video frame e, and video frame f), when the key video frame corresponding to the target pixel in the target user terminal is the video frame c of the target video, the video frame c can be further The pixel point A in the target object is used as the target pixel point.
  • the trajectory information of the pixel point A can be filtered from the application server, which can then be based on the filtered pixel point
  • the track information of A obtains the position information of the pixel A in each video frame (ie, video frame d, video frame e, and video frame f) after the key video frame.
  • the target trajectory information obtained by the target user terminal may be pre-calculated trajectory information.
  • the target trajectory information obtained by the target user terminal may include the target pixel in the aforementioned video frame a, video frame b.
  • the target trajectory information obtained by the target user terminal may be composed of partial position information determined from the pre-calculated trajectory information
  • the target trajectory information acquired by the target user terminal may include the target pixel in the video
  • the position information in frame d, video frame e, and video frame f, and the position information of the target pixel in video frame d, video frame e, and video frame f can be called partial position information.
  • the target user terminal can also find the trajectory information that contains the position information of the target pixel in the key video frame (that is, the trajectory information corresponding to the pixel A), which is found by the application server. ) Is collectively referred to as target trajectory information.
  • the target trajectory information can be regarded as the trajectory information corresponding to the pixel point A that matches the target pixel point found from all the pixels of the target video. Since the trajectory information can contain the position information of the pixel A in each video frame of the target video, naturally, it is also possible to quickly obtain each video frame of the target pixel after the key video from the trajectory information Location information in.
  • Step S209 When the next video frame of the key video frame is played, display the multimedia information based on the position information of the target pixel in the target track information in the next video frame of the key video frame .
  • the embodiment of the present application can filter out the key video frame from the trajectory information corresponding to all the pre-calculated pixels when the target pixel in the target object selected by the target user is obtained.
  • the trajectory information associated with the position information of the target pixel in the, and then the filtered trajectory information can be called target trajectory information. Because the embodiment of the present application can perform pixel tracking on the pixels in each video frame in the video in advance, when the average displacement matrix corresponding to the first video frame in each image pair is obtained, the average displacement matrix can be quickly obtained.
  • the position information of each pixel in the video in the corresponding video frame can perform pixel tracking on the pixels in each video frame in the video in advance, when the average displacement matrix corresponding to the first video frame in each image pair is obtained, the average displacement matrix can be quickly obtained.
  • the position information of each pixel in the video in the corresponding video frame can be quickly obtained.
  • the pre-calculated position information of each pixel in the corresponding video frame can be used to characterize the position information of each pixel in the video played in the current video playback interface in the corresponding video frame. Therefore, when the target user terminal obtains the target pixel in the target object and the multimedia information associated with the target object, the trajectory information corresponding to the target pixel can be quickly filtered from the trajectory information corresponding to all pixels Called target trajectory information, and then the target trajectory information can be returned to the target user terminal, so that the target user terminal can be based on the target pixel point carried in the target trajectory information for each video frame after the key video frame
  • the location information in the tracking display of the multimedia information for example, barrage data
  • the barrage data can track this trajectory in the target user terminal. Display in a circle.
  • FIG. 10 is a schematic diagram of displaying barrage data based on trajectory information according to an embodiment of the present application.
  • the video frame 100 shown in FIG. 10 may contain multiple objects, for example, it may contain object 1, object 2, and object 3 shown in FIG. 10. If the target user uses the object 1 shown in FIG. 10 as the target object in the target user terminal, the video frame 100 can be called a key video frame, and the target object can be triggered by the target user. The corresponding pixel is called the target pixel. If the target user terminal has strong computing performance, the position information of each pixel in the target video in each video frame can be pre-calculated in the target user terminal, so that the target user terminal can be To get the track information associated with the target video.
  • the track information 1 shown in FIG. 10 can be obtained by pre-calculation, that is, the position information in the track information 1 is determined by the position information of the pixels in the target video in each video frame of the target video. definite. Therefore, the target user terminal can quickly regard the trajectory information 1 shown in FIG. 10 as the target trajectory information based on the position information of the target pixel in the object 1, so that it can be based on the object 1 in the trajectory information 1.
  • the position information in each video frame ie, the video frame 200 and the video frame 300 shown in Figure 10) after the key video frame quickly determines the multimedia information associated with the target object (ie, object 1) (ie, as shown in Figure 10).
  • the barrage data 1 is BBBBB) for tracking and display. That is, the barrage data displayed in the video frame 200 and the video frame 300 shown in FIG. 10 are both determined by the position information in the track information system 1 shown in FIG. 10.
  • the trajectory information associated with the target video shown in FIG. 10 may also be pre-calculated by the application server, so that when the application server receives the trajectory acquisition request for the target pixel in the object 1, It is also possible to quickly obtain the trajectory information associated with the position information of the target pixel in the key video frame from the trajectory information associated with the target video shown in FIG. 10, that is, by comparing the complete image of all pixels in the target video Pixel tracking is executed in the application server, which can effectively reduce the amount of calculation of the target user terminal, so as to ensure that when the target user terminal obtains the trajectory information 1 shown in FIG.
  • the barrage data 1 shown in FIG. 10 is quickly tracked and displayed, so that the flexibility of the barrage data display can be improved.
  • the video frame corresponding to the trigger operation in the target video may be called a key video frame, so that the target pixel can be determined from the key video frame.
  • obtain the multimedia information associated with the target pixel and the target object where the target pixel is located for example, the multimedia information may be user text, pictures, expressions and other barrage data in the target video
  • the position information of the point in the key video frame determines the trajectory acquisition request corresponding to the target pixel, and then the target trajectory associated with the position information of the target pixel in the key video frame can be acquired based on the trajectory acquisition request Information, so that when the next video frame of the key video frame is played, the barrage data associated with the target pixel and the target object where the target pixel is located can be displayed based on the target track information.
  • the embodiment of the present application can further filter out the trajectory information of the target pixel from the trajectory information of all pixels in the key video frame, and determine the value of the selected target pixel.
  • the trajectory information is called target trajectory information, so that the display effect of barrage data can be enriched based on the obtained target trajectory information.
  • the obtained target trajectory information may be different, thereby making The display effect of barrage data will be different.
  • the position information of the barrage data in each video frame after the key video frame can be quickly determined. In other words, the barrage data will be displayed in the target video.
  • the target object is always changed in the video, which can enrich the visual display effect of the user text in the video, and can make the barrage data more closely related to the target object or the object in the commented video.
  • FIG. 11 is a schematic structural diagram of a video data processing apparatus provided by an embodiment of the present application.
  • the video data processing apparatus 1 can be applied to the target user terminal in the embodiment corresponding to FIG. 1 above.
  • the video data processing device 1 may include: an object determination module 1101, a request determination module 1102, a trajectory acquisition module 1103, and a text display module 1104;
  • the object determining module 1101 is configured to determine a target pixel from a key video frame of the target video in response to a trigger operation on the target video, and obtain multimedia information associated with the target pixel, wherein the key A video frame is a video frame where the trigger operation is located, and the target pixel is a pixel in the key video frame corresponding to the trigger operation;
  • a request determination module 1102 configured to determine a trajectory acquisition request corresponding to the target pixel based on the position information of the target pixel in the key video frame;
  • the trajectory acquisition module 1103 is configured to acquire target trajectory information associated with the position information of the target pixel in the key video frame based on the trajectory acquisition request, wherein the target trajectory information includes the target pixel Position information of a point in a video frame next to the key video frame, and the position information of the target pixel in a video frame next to the key video frame is obtained by tracking the target pixel;
  • the text display module 1104 is configured to, when the next video frame of the key video frame is played, based on the position information of the target pixel in the target track information in the next video frame of the key video frame, Display the multimedia information.
  • the specific execution method of the object determination module 1101, the request determination module 1102, the trajectory acquisition module 1103, and the text display module 1104 can refer to the description of step S101 to step S104 in the embodiment corresponding to FIG. 4, and will not continue here. Repeat.
  • the embodiment of the present application may use the video frame corresponding to the trigger operation in the target video as the key video frame when the trigger operation of the target user on the target video is acquired, so that the target pixel can be determined from the key video frame and obtained Multimedia information associated with the target pixel and the target object where the target pixel is located (for example, the multimedia information may be barrage data such as user text, pictures, and expressions in the target video).
  • the trajectory acquisition request corresponding to the target pixel is determined based on the position information of the target pixel in the key video frame, and then the position of the target pixel in the key video frame can be acquired based on the trajectory acquisition request Information associated target track information, so that when the next video frame of the key video frame is played, the barrage data associated with the target pixel and the target object where the target pixel is located can be displayed based on the target track information. It can be seen from this that when the key video frame is determined, the embodiment of the present application can further filter out the trajectory information of the target pixel from the trajectory information of all pixels in the key video frame, and determine the value of the selected target pixel.
  • the trajectory information is used as the target trajectory information, so that the display effect of the barrage data can be enriched based on the obtained target trajectory information.
  • the obtained target trajectory information may be different, which in turn makes the display effect of the barrage data different.
  • the position information of the barrage data in each video frame after the key video frame can be quickly determined. In other words, the barrage data will be displayed in the target video.
  • the target object is always changed in the video, which can enrich the visual display effect of the user text in the video, and can make the barrage data more closely related to the target object or the object in the commented video.
  • FIG. 12 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1000 may be the target user terminal in the embodiment corresponding to FIG. 1 above.
  • the foregoing computer device 1000 may include: a processor 1001, a network interface 1004, and a memory 1005.
  • the aforementioned computer device 1000 may further include: a user interface 1003 and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display (Display) and a keyboard (Keyboard).
  • the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located far away from the foregoing processor 1001. As shown in FIG. 12, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide network communication functions; and the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 Procedure to achieve:
  • a target pixel is determined from a key video frame of the target video, and multimedia information associated with the target pixel is acquired, wherein the key video frame is the trigger operation In the video frame where the target pixel is located, the target pixel is the pixel corresponding to the trigger operation in the key video frame;
  • the target trajectory information Based on the trajectory acquisition request, acquire target trajectory information associated with the position information of the target pixel in the key video frame; the target trajectory information includes that the target pixel is located below the key video frame Position information in a video frame; the position information of the target pixel in the video frame next to the key video frame is obtained by tracking the target pixel;
  • the multimedia information is displayed based on the position information of the target pixel in the target track information in the next video frame of the key video frame.
  • the computer device 1000 described in the embodiment of the present application can perform the description of the foregoing video data processing method in the foregoing embodiment corresponding to FIG. 4, and may also perform the foregoing description of the foregoing video data processing apparatus in the foregoing embodiment corresponding to FIG. 11
  • the description of 1 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application also provide a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the video data processing apparatus 1, and the aforementioned computer program includes a program Instructions, when the above-mentioned processor executes the above-mentioned program instructions, it can execute the description of the above-mentioned video data processing method in the embodiment corresponding to FIG. 4, and therefore, it will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the computer storage media involved in this application please refer to the description of the method embodiments of this application.
  • FIG. 13 is a schematic structural diagram of another video data processing apparatus provided by an embodiment of the present application.
  • the video data processing apparatus 2 may be applied to the application server in the embodiment corresponding to FIG. 8, and the application server may be the business server 2000 in the embodiment corresponding to FIG. 1.
  • the video data processing device 2 may include: a request response module 1301 and a track screening module 1302;
  • the request response module 1301 is configured to obtain trajectory information associated with the target video in response to a trajectory acquisition request for a target pixel in a key video frame, where the key video frame is a video frame in the target video, The target pixel is a pixel in the key video frame, and the track information is determined by pixel position information in each video frame in the target video;
  • the trajectory filtering module 1302 is configured to filter the target trajectory information associated with the position information of the target pixel in the key video frame from the trajectory information associated with the target video, and return the target trajectory information
  • the target track information includes target location information; the target location information is used to trigger the display of multimedia information associated with the target pixel in the next video frame of the key video frame.
  • step S207 and step S208 in the embodiment corresponding to FIG. 8, which will not be repeated here.
  • FIG. 14 is a schematic structural diagram of another computer device provided by an embodiment of the present application.
  • the computer device 2000 may be the target service server 2000 in the embodiment corresponding to FIG. 1 above.
  • the aforementioned computer device 2000 may include: a processor 2001, a network interface 2004, and a memory 2005.
  • the aforementioned computer device 2000 may further include: a user interface 2003 and at least one communication bus 2002.
  • the communication bus 2002 is used to realize the connection and communication between these components.
  • the user interface 2003 may include a display (Display) and a keyboard (Keyboard), and the optional user interface 2003 may also include a standard wired interface and a wireless interface.
  • the network interface 2004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 2004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 2005 may also be at least one storage device located far away from the aforementioned processor 2001.
  • the memory 2005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 2004 can provide network communication functions;
  • the user interface 2003 is mainly used to provide an input interface for the user; and
  • the processor 2001 can be used to call the device control application stored in the memory 2005 Procedure to achieve:
  • the trajectory information associated with the target video is acquired, where the key video frame is a video frame in the target video, and the target pixel is all
  • the track information is determined by the pixel point position information in each video frame of the target video;
  • the target track information includes Target location information; the target location information is used to trigger the display of multimedia information associated with the target pixel in the next video frame of the key video frame.
  • the computer device 2000 described in the embodiment of the present application can perform the description of the foregoing video data processing method in the foregoing embodiment corresponding to FIG. 8 and may also perform the foregoing description of the foregoing video data processing device in the foregoing embodiment corresponding to FIG. 13
  • the description of 2 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application also provide a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the video data processing device 2, and the aforementioned computer program includes a program Instructions, when the above-mentioned processor executes the above-mentioned program instructions, it can execute the description of the above-mentioned video data processing method in the embodiment corresponding to FIG. 8 above, and therefore, it will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
  • the description of the beneficial effects of using the same method will not be repeated.
  • FIG. 15 is a schematic structural diagram of another video data processing apparatus provided by an embodiment of the present application.
  • the video data processing device 3 can be applied to the service server 2000 in the embodiment corresponding to FIG. 1 and also can be applied to the target user terminal in the embodiment corresponding to FIG. 1.
  • the video data processing device 3 may include: a first acquisition module 310, a matrix acquisition module 410, a position tracking module 510, and a trajectory generation module 610;
  • the first obtaining module 310 is configured to obtain adjacent first video frames and second video frames from the target video;
  • the matrix acquisition module 410 is configured to determine the corresponding optical flow tracking rules of the target video, the pixels in the first video frame, and the pixels in the second video frame Average displacement matrix;
  • the matrix obtaining module 410 includes: a first determining unit 4001, a matrix determining unit 4002, a pixel point screening unit 4003, a matrix correcting unit 4004, and a second determining unit 4005;
  • the first determining unit 4001 is configured to obtain the optical flow tracking rule corresponding to the target video, and determine the position information of the pixel in the first video frame as the first position information, and determine the second video frame The position information of the pixel in is determined as the second position information;
  • the matrix determining unit 4002 is configured to obtain the optical flow tracking rule, the first position information of the pixels in the first video frame, and the second position information of the pixels in the second video frame based on the optical flow tracking rules.
  • the matrix determining unit 4002 includes: a first tracking subunit 4021 and a second tracking subunit 4022;
  • the first tracking subunit 4021 is configured to forwardly map the pixels in the first video frame to the optical flow tracking rule based on the first position information of the pixels in the first video frame Second video frame, and determine the second position information of the first mapping point obtained by mapping in the second video frame, and based on the first position information of the pixel point and the second position information of the first mapping point The position information determines the forward displacement matrix corresponding to the first video frame;
  • the second tracking subunit 4022 based on the second position information of the pixels in the second video frame and the optical flow tracking rule, reversely maps the first mapping point in the second video frame to the The first video frame, and the third position information of the second mapping point obtained by mapping is determined in the first video frame, and is based on the second position information of the first mapping point and the second position information of the second mapping point The three position information determines the reverse displacement matrix corresponding to the second video frame.
  • first tracking subunit 4021 and the second tracking subunit 4022 can refer to the description of the cloud forward and backward optical flow method in the embodiment corresponding to FIG. 8, and the details will not be repeated here.
  • the pixel point screening unit 4003 is configured to, based on the first position information of the pixel points in the first video frame, the forward displacement matrix, and the reverse displacement matrix, select the pixels that meet the target filtering condition Pixels are determined as effective pixels;
  • the pixel point screening unit 4003 includes: a first position determining subunit 4031, a second position determining subunit 4032, a third position determining subunit 4033, an error determining subunit 4034, and an effective screening subunit 4035;
  • the first position determining subunit 4031 is configured to obtain a first pixel point from pixels in the first video frame, and determine first position information of the first pixel point in the first video frame, And determining the first lateral displacement and the first longitudinal displacement corresponding to the first pixel from the forward displacement matrix;
  • the second position determining subunit 4032 is configured to forward the first pixel point based on the first position information of the first pixel point, the first lateral displacement and the first longitudinal displacement corresponding to the first pixel point Mapping to the second video frame, and determining second position information of the second pixel point obtained by mapping in the second video frame;
  • the third position determining subunit 4033 is used to determine the second lateral displacement and the second longitudinal displacement corresponding to the second pixel from the reverse displacement matrix, and based on the second position information of the second pixel , The second horizontal displacement and the second vertical displacement corresponding to the second pixel point are reversely mapped to the first video frame, and the mapped result is determined in the first video frame Third position information of the third pixel of
  • the error determination subunit 4034 is configured to determine the distance between the first pixel and the third pixel based on the first position information of the first pixel and the third position information of the third pixel. Error distance, and according to the first position information of the first pixel and the second position information of the second pixel, determine the difference between the image block containing the first pixel and the image block containing the second pixel Correlation coefficient between
  • the effective screening sub-unit 4035 is configured to determine the pixels whose error distance is less than the error distance threshold and the correlation coefficient is greater than or equal to the correlation coefficient threshold among the pixels as effective pixels.
  • the specific implementation of the first position determination sub-unit 4031, the second position determination sub-unit 4032, the third position determination sub-unit 4033, the error determination sub-unit 4034, and the effective screening sub-unit 4035 can refer to the embodiment corresponding to FIG. 8 above.
  • the details will not be repeated here.
  • the matrix correction unit 4004 is configured to correct the initial state matrix and the forward displacement matrix corresponding to the first video frame based on the effective pixel points to obtain the target state matrix and target displacement corresponding to the first video frame matrix;
  • the matrix correction unit 4004 includes: an initial acquisition subunit 4041, a value switching subunit 4042, a displacement setting subunit 4043;
  • the initial acquisition subunit 4041 acquires the initial state matrix corresponding to the first video frame; the state value of each matrix element in the initial state matrix is the first value, and one matrix element corresponds to one of the pixels ;
  • the value switching subunit 4042 is used to switch the state value of the matrix element corresponding to the effective pixel point from the first value to the second value in the initial state matrix, and determine the initial state matrix containing the second value Is the target state matrix corresponding to the first video frame;
  • the displacement setting subunit 4043 is configured to set the displacement of the matrix element corresponding to the remaining pixel points in the forward displacement matrix to the first value, and determine the forward displacement matrix containing the first value Is the target displacement matrix; the remaining pixel points are the pixel points excluding the effective pixel points among the pixel points.
  • the displacement setting sub-unit 4043 is specifically configured to, if the forward displacement matrix includes an initial lateral displacement matrix and an initial longitudinal displacement matrix, in the initial lateral displacement matrix, the matrix elements corresponding to the remaining pixels
  • the first lateral displacement of is set to the first value, and the initial lateral displacement including the first value is determined as the lateral displacement matrix corresponding to the first video frame;
  • the displacement setting subunit 4043 is also specifically configured to set the first longitudinal displacement of the matrix element corresponding to the remaining pixel to the first value in the initial longitudinal displacement matrix, and include the first value
  • the initial longitudinal displacement of is determined as the longitudinal displacement matrix corresponding to the first video frame;
  • the displacement setting subunit 4043 is further specifically configured to determine the horizontal displacement matrix corresponding to the first video frame and the longitudinal displacement matrix corresponding to the first video frame as the target displacement matrix.
  • the initial acquisition subunit 4041 the value switching subunit 4042, the displacement setting subunit 4043, please refer to the description of the modified initial state matrix and the forward displacement matrix in the embodiment corresponding to FIG. 8, which will not be repeated here. .
  • the second determining unit 4005 is configured to determine an average displacement matrix corresponding to the first video frame based on the target state matrix and the target displacement matrix.
  • the second determining unit 4005 includes: a first integration subunit 4051, a second integration subunit 4052, a third integration subunit 4053, and a difference operation subunit 4054;
  • the first integration subunit 4051 is configured to perform a displacement integration operation on the target state matrix in the first video frame to obtain the state integration matrix corresponding to the pixel points in the first video frame;
  • the second integration subunit 4052 is configured to perform a displacement integration operation on the lateral displacement matrix in the target state matrix in the first video frame to obtain the lateral displacement integral matrix corresponding to the pixel points in the first video frame ;
  • the third integration subunit 4053 is configured to perform a displacement integration operation on the longitudinal displacement matrix in the target state matrix in the first video frame to obtain the longitudinal displacement integral matrix corresponding to the pixel points in the first video frame ;
  • the difference operation subunit 4054 is used to determine the difference area corresponding to the displacement difference operation from the first video frame, and determine the difference area based on the size information of the difference area, the state integral matrix, the horizontal displacement integral matrix and the longitudinal displacement integral matrix. The average displacement matrix corresponding to the first video frame.
  • the difference operation subunit 4054 includes: a first difference numerator unit 4055, a second difference numerator unit 4056, a third difference numerator unit 4057, and an average determination subunit 4058;
  • the first difference molecule unit 4055 is configured to perform a displacement difference operation on the state integral matrix based on the length information and width information corresponding to the difference area to obtain the state difference matrix corresponding to the first image frame;
  • the second differential molecule unit 4056 is configured to perform a displacement difference operation on the lateral displacement integral matrix and the longitudinal displacement integral matrix based on the length information and width information corresponding to the difference area, to obtain the lateral displacement corresponding to the first image frame. Displacement difference matrix and longitudinal displacement difference matrix;
  • the third difference element unit 4057 is used to determine the ratio between the lateral displacement difference matrix and the state difference matrix as the horizontal average displacement matrix, and to determine the difference between the longitudinal displacement difference matrix and the state difference matrix The ratio is determined as the longitudinal average displacement matrix;
  • the average determination subunit 4058 is configured to determine the longitudinal displacement difference matrix and the longitudinal average displacement matrix as the average displacement matrix corresponding to the first video frame.
  • the specific implementation of the first integration sub-unit 4051, the second integration sub-unit 4052, the third integration sub-unit 4053, and the difference operation sub-unit 4054 can refer to the cloud displacement integration method and cloud displacement in the embodiment corresponding to FIG. The description of the difference method will not be repeated here.
  • the specific implementation of the first determining unit 4001, the matrix determining unit 4002, the pixel point screening unit 4003, the matrix correcting unit 4004, and the second determining unit 4005 can refer to the description of step S202 in the embodiment corresponding to FIG. 8, here Will not continue to repeat.
  • the position tracking module 510 is configured to track the position information of the pixels in the first video frame based on the average displacement matrix, and determine the position information of the pixels obtained by tracking in the second video frame;
  • the trajectory generating module 610 is configured to generate a trajectory associated with the target video based on the position information of the pixels in the first video frame and the position information of the pixels obtained by the tracking in the second video frame Information;
  • the trajectory information includes target trajectory information used to track and display the multimedia information associated with the target pixel in the target video.
  • the specific implementation of the first acquisition module 310, the matrix acquisition module 410, the position tracking module 510, and the trajectory generation module 610 can refer to the description of steps S201 to S204 in the embodiment corresponding to FIG. 8, and will not continue here. Go ahead.
  • FIG. 16 is a schematic structural diagram of another computer device provided by an embodiment of the present application.
  • the foregoing computer device 3000 may be applied to the service server 2000 in the foregoing embodiment corresponding to FIG. 1.
  • the above-mentioned computer equipment 3000 may include: a processor 3001, a network interface 3004 and a memory 3005.
  • the above-mentioned video data processing device 3000 may also include: a user interface 3003 and at least one communication bus 3002. Among them, the communication bus 3002 is used to implement connection and communication between these components.
  • the user interface 3003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 3003 may also include a standard wired interface and a wireless interface.
  • the network interface 3004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 3004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 3005 may also be at least one storage device located far away from the foregoing processor 3001. As shown in FIG. 16, the memory 3005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 3004 can provide network communication functions; the user interface 3003 is mainly used to provide an input interface for the user; and the processor 3001 can be used to call the device control application stored in the memory 3005 Procedure to achieve:
  • the trajectory information associated with the target video is generated; in the trajectory information Contains target trajectory information used to track and display the multimedia information associated with the target pixel in the target video.
  • the computer device 3000 described in the embodiment of the present application can perform the description of the foregoing video data processing method in the foregoing embodiment corresponding to FIG. 8 and may also perform the foregoing description of the foregoing video data processing apparatus in the foregoing embodiment corresponding to FIG. 15
  • the description of 3 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application also provide a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the video data processing device 3, and the aforementioned computer program includes a program Instructions, when the above-mentioned processor executes the above-mentioned program instructions, it can execute the description of the above-mentioned video data processing method in the embodiment corresponding to FIG. 8 above, and therefore, it will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the computer storage media involved in this application please refer to the description of the method embodiments of this application.
  • the above-mentioned program can be stored in a computer-readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了一种视频数据处理方法、装置以及存储介质,该方法包括:响应针对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标对象,并获取与所述目标对象相关联的多媒体信息;基于所述目标对象中的目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求;基于所述轨迹获取请求获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息;当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息显示所述多媒体信息。

Description

一种视频数据处理方法和相关装置
本申请要求于2019年04月30日提交的申请号为201910358569.8、发明名称为“一种视频数据处理方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种视频数据处理方法和相关装置。
背景技术
用户在通过用户终端观看网络视频的过程中,可以在视频播放界面上看到该用户或其它用户发表的用户文字或用户评论。在现有的用户文字显示方式中,输出至视频播放界面中的用户文字,通常是通过该视频播放界面中固定的文字显示轨道进行输出和显示。
发明内容
本申请实施例提供一种视频数据处理方法和相关装置。
本申请实施例一方面提供了一种视频数据处理方法,所述方法应用于计算机设备,包括:
响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点;
基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求;
基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的;
当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,显示所述多媒体信息。
本申请实施例一方面提供了一种视频数据处理方法,所述方法应用于业务服务器,包括:
响应针对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息,其中,所述关键视频帧是所述目标视频中的视频帧,所述目标像素点是所述关键视频帧中的像素点,所述轨迹信息是由所述目标视频的每个视频帧中的像素点的位置信息所确定的;
从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息,其中,所述目标轨迹信息包含目标位置信息,所述目标位置信息用于触发在所述关键视频帧的下一视频帧中,显示与所述目标像素点相关联的多媒体信息。
本申请实施例一方面提供了一种视频数据处理方法,所述方法包括:
从目标视频中获取相邻的第一视频帧和第二视频帧;
基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵;
基于所述平均位移矩阵,对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息;
基于所述第一视频帧中的像素点的位置信息、所述跟踪得到的像素点在所述第二视频帧中的位置信息,生成与所述目标视频相关联的轨迹信息,其中,所述轨迹信息中包含用于对目标视频中的目标像素点所关联的多媒体信息进行跟踪显示的目标轨迹信息。
本申请实施例一方面提供了一种视频数据处理装置,所述装置应用于计算机设备,包括:
对象确定模块,用于响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点;
请求确定模块,用于基于所述目标像素点在所述关键视频帧中的位置信息, 确定所述目标像素点对应的轨迹获取请求;
轨迹获取模块,用于基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的;
文本显示模块,用于当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息显示所述多媒体信息。
本申请实施例一方面提供了一种视频数据处理装置,所述装置应用于业务服务器,包括:
请求响应模块,用于响应于对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息,其中,所述关键视频帧是所述目标视频中的视频帧,所述目标像素点是所述关键视频帧中的像素点,所述轨迹信息是由所述目标视频的每个视频帧中的像素点的位置信息所确定的;
轨迹筛选模块,用于从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息,其中,所述目标轨迹信息包含目标位置信息,所述目标位置信息用于触发在所述关键视频帧的下一视频帧中,显示与所述目标像素点相关联的多媒体信息。
本申请实施例一方面提供了一种视频数据处理装置,所述装置包括:
第一获取模块,用于从目标视频中获取相邻的第一视频帧和第二视频帧;
矩阵获取模块,用于基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵;
位置跟踪模块,用于基于所述平均位移矩阵对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息;
轨迹生成模块,用于基于所述第一视频帧中的像素点的位置信息、所述跟踪得到的像素点在所述第二视频帧中的位置信息,生成与所述目标视频相关联 的轨迹信息,其中,所述轨迹信息中包含用于对目标视频中的目标像素点所关联的多媒体信息进行跟踪显示的目标轨迹信息。
本申请实施例一方面提供了一种计算机设备,包括:处理器、存储器、网络接口;
所述处理器与存储器、网络接口相连,其中,网络接口用于提供数据通信功能,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以执行如本申请实施例中一方面中的方法。
本申请实施例一方面提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如本申请实施例中一方面中的方法。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种网络架构的结构示意图;
图2是本申请实施例提供的一种目标视频中的多个视频帧的示意图;
图3是本申请实施例提供的一种获取目标视频的场景示意图;
图4是本申请实施例提供的一种视频数据处理方法的流程示意图;
图5是本申请实施例提供的一种获取多媒体信息的示意图;
图6是本申请实施例提供的一种全图像素追踪的示意图;
图7是本申请实施例提供的一种在连续多个视频帧中跟踪弹幕数据的示意图;
图8是本申请实施例提供的另一种视频数据处理方法的示意图;
图9是本申请实施例提供的一种确定有效像素点的方法;
图10是本申请实施例提供的一种基于轨迹信息显示弹幕数据的示意图;
图11是本申请实施例提供的一种视频数据处理装置的结构示意图;
图12是本申请实施例提供的一种计算机设备的结构示意图;
图13是本申请实施例提供的另一种视频数据处理装置的结构示意图;
图14是本申请实施例提供的另一种计算机设备的结构示意图;
图15是本申请实施例提供的又一种视频数据处理装置的结构示意图;
图16是本申请实施例提供的又一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在现有的网络视频的播放过程中,视频播放界面中显示的用户文字是独立于视频播放界面上所播放的视频内容,以至于所显示的用户文字与视频内容之间缺乏一定的相关性。此外,当用户需要发送用户文字时,用户终端是将获取到的用户文字,通过预定的文字显示轨道进行输出。因此,对于每个用户所发送的用户文字而言,均是通过相同的文字显示轨道进行输出,从而不能有针对性地对视频内容进行评论。
请参见图1,是本申请实施例提供的一种网络架构的结构示意图。如图1所示,所述网络架构可以包括业务服务器2000(或应用服务器2000)和用户终端集群。所述业务服务器2000可以是由大量服务器组成的服务器集群,例如,云端服务器,或简称为云端。所述用户终端集群可以包括多个用户终端,如图1所示,具体可以包括用户终端3000a、用户终端3000b、用户终端3000c、…、用户终端3000n。如图1所示,用户终端3000a、用户终端3000b、用户终端3000c、…、用户终端3000n可以分别与所述业务服务器2000进行网络连接,以便于每个用户终端可以通过该网络连接与业务服务器2000之间进行数据交互。
如图1所示,用户终端集群中的每个用户终端均可以集成安装有目标应用,当该目标应用运行于各用户终端中时,可以分别与上述图1所示的业务服务器2000之间进行数据交互。其中,目标应用可以包含多媒体应用、社交应用、娱乐应用等具有视频播放功能的应用。为便于理解,本申请实施例以所述多个用户终端中的一个用户终端作为目标用户终端为例,以阐述集成有该目标应用的 目标用户终端通过业务数据展示平台与所述业务服务器2000之间实现数据交互的具体过程。其中,本申请实施例中的目标用户终端可以包含个人电脑、平板电脑、笔记本电脑、智能手机等集成有上述目标应用的移动终端。其中,该业务服务器2000可以为该目标应用的后台服务器,该后台服务器对应的业务数据库可以用于存储展示在该业务数据展示平台上的每个业务数据信息,该业务数据信息可以包含视频数据等互联网信息。应当理解,该业务数据展示平台上可以显示多个视频,当目标用户在该目标用户终端中通过该业务数据展示平台触发多个视频中的一个视频时,可以获取该视频对应的视频数据,进而可以在该目标用户终端中播放该视频数据,并可以进一步将当前正在该目标用户终端中播放的视频数据称之为目标视频。该目标视频是由该业务服务器2000基于目标用户终端所发送的数据加载指令所返回的视频数据。
其中,所述目标视频中可以包含多个视频帧,每个视频帧均可以称之为一个图像数据,且每个视频帧均对应于该目标视频的播放时长中的一个播放时间戳(即一个时刻),以便于后续目标用户终端在加载并播放该目标视频时,可以基于该目标视频中的每个视频帧对应的播放时间戳在播放显示界面中显示相应的视频帧。
其中,业务服务器2000可以在视频的预处理阶段,将业务数据库中所存储的视频集中的每个视频进行分帧处理,从而可以将每个视频所包含的多个视频帧拆分成一张张图片。为便于理解,进一步地,请参见图2,是本申请实施例提供的一种目标视频中的多个视频帧的示意图。其中,该目标视频可以为前述业务数据库中的视频A,如图2所示,该视频A可以包含n个视频帧,n为大于0的正整数。业务服务器2000可以预先将该视频A中的n个视频帧拆分为n个图片。可以将这n个图片中的每两个前后相邻的图片称之为一个图像对。比如,如图2所示,本申请实施例可以将图2所示的第一时刻对应的视频帧与第二时刻对应的视频帧称之为一个图像对,并可以将第二时刻对应的视频帧与第三时刻对应的视频帧称之为一个图像对,…,并可以将第n-1时刻的视频帧与第n时刻对应的视频帧称之为第一图像对。换言之,对于一个目标视频而言,可以从该目标视频的多个视频帧中确定出多个图像对,每个图像对中均可以包含前后相邻的两个时刻对应的视频帧,即每个图像对中均可以包含两个相邻的视频帧。
为便于理解,本申请实施例以所述多个图像对中的首个图像对为例。在视频预处理阶段,本申请实施例可以将该首个图像对中的一个视频帧(例如,图2所示的第一时刻对应的视频帧)称之为第一视频帧,并可以将该图像对中的另一个视频帧(即第二时刻对应的视频帧)称之为第二视频帧,进而可以基于光流追踪规则,对该图像对中第一视频帧中的所有像素点的位置信息进行跟踪,以追踪得到该第一视频帧中的每个像素点出现在第二视频帧中的位置信息。由于每个图像对中均包含两个相邻的视频帧,从而可以计算得到每个图像对中第一视频帧中的像素点出现在下一视频帧中的位置信息,最后可以在该业务服务器2000中确定出该视频A中的所有像素点在所有视频帧中的运动轨迹,并可以将这些像素点的运动轨迹统称为像素点的轨迹信息。
由于业务服务器2000可以预先计算好该视频A中的所有像素点的轨迹信息,因此,当目标用户在目标用户终端中播放该视频A时,可以将该当前播放的视频A称之为一个目标视频。在播放该目标视频的过程中,若目标用户需要跟踪某个对象,则可以在目标用户终端中对需要跟踪的这个对象(即目标对象)执行触发操作,与触发操作对应的像素点称为目标像素点,即该目标像素点是由该目标用户针对当前播放的视频帧中的目标对象所执行的触发操作而确定的,该触发操作可以用于在当前所播放的视频帧中选择需要进行跟踪的目标对象。其中,本申请实施例可以将该触发操作对应的视频帧称之为关键视频帧,换言之,本申请实施例可以将当前包含该目标像素点的视频帧称之为该目标视频中的关键视频帧。可以理解的是,该关键视频帧可以为上述图2所对应实施例中的第一时刻对应的视频帧,可选地,该关键视频帧也可以为上述图2所对应实施例中第二时刻对应的视频帧,这里将不一一列举。
应当理解,本申请实施例可以将该关键视频帧、该关键视频帧中的目标像素点以及该目标像素点的位置信息给到上述图1所对应实施例中的业务服务器2000,以使该业务服务器2000可以基于该目标像素点在该关键视频帧中的位置信息,从预先计算得到的该目标视频中的所有像素点的轨迹信息中,筛选与该目标像素点的位置信息相匹配的轨迹信息,作为目标轨迹信息。其中,所述目标轨迹信息可以包括目标像素点在所述关键视频帧之后的视频帧中的位置坐标。进而,业务服务器2000可以将该目标轨迹信息返回给该目标用户终端,以使目标用户终端可以在播放该关键视频帧的下一视频帧时,进一步根据该目标 轨迹信息确定目标像素点在该关键视频帧的下一视频帧中的位置信息,即可以得到该目标像素点的目标位置信息,进而可以基于该目标位置信息显示该目标对象对应的多媒体信息。
其中,进一步地,请参见图3,是本申请实施例提供的一种获取目标视频的场景示意图。如图3所示的目标用户终端可以为上述图1所对应实施例中的用户终端3000a。如图3所示,目标用户可以在进入该目标应用之后,在该目标用户终端(例如,智能手机)中显示该目标应用的业务数据展示平台,该业务数据展示平台上可以显示图3所示的视频10a、视频20a、视频30a和视频40a。当目标用户需要在目标用户终端中播放图3所示的视频30a(该视频30a可以为上述图3所对应实施例中的视频A)时,可以对该视频30a所在区域执行播放操作(例如,目标用户可以对该视频30a执行点击操作),进而可以将该视频30a对应的目标标识信息添加到图3所示的数据加载指令中,以将该数据加载指令进一步给到与该目标用户终端具有网络连接关系的应用服务器,该应用服务器可以为上述图1所对应实施例中的业务服务器2000。可以理解的是,该应用服务器可以在获取到数据加载指令时,从业务数据库中查找该目标标识信息对应的视频数据,并可以将查找到的视频数据统称为目标数据,以便于能够将该目标数据给到图3所示的目标用户终端,以使该目标用户终端可以在图3所示的视频播放界面中播放该视频数据,此时,该目标用户终端可以将该目标用户所选取并播放的视频30a称之为目标视频,即此时,该目标用户终端可以按照上述图3所示的播放时间戳播放视频A中的每个视频帧。
其中,上述目标用户终端获取目标对象以及获取目标轨迹信息的具体过程,可以参见如下图4至图7所对应的实施例所提供的实现方式。另外,上述业务服务器2000获取所述像素点在所述第二视频帧中的位置信息。以及筛选所述目标像素点对应的目标轨迹信息的具体过程,可以参见如下图8至图10所对应的实施例所提供的实现方式。
请参见图4,是本申请实施例提供的一种视频数据处理方法的流程示意图。如图4所示,该方法可以应用于上述图1所对应实施例中的目标用户终端,该方法可以包括:
步骤S101,响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所 述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点。
具体地,目标用户终端可以在访问目标应用时,在该目标应用的显示界面上显示用于承载多个业务数据信息的业务数据展示平台,例如,该业务数据展示平台上的每个业务数据信息可以为一个视频。在该业务数据展示平台上所展示的业务数据信息可以是由与该目标用户终端具有网络连接关系的应用服务器基于目标用户的用户画像数据(例如,该目标用户的历史行为数据)进行筛选后所确定的。当目标用户针对该业务数据展示平台上的一个业务数据信息(例如,一个视频)执行播放操作时,可以从该应用服务器对应的业务数据库中加载得到该视频对应的视频数据,进而可以在该目标用户终端的视频播放界面中播放该加载到的视频数据。进一步地,目标用户终端可以在播放该视频数据的过程中,获取该目标用户针对该视频播放界面中的目标对象(即需要跟踪的对象)执行的触发操作。所述触发操作例如是用鼠标点击或触摸目标用户终端显示屏上显示的视频帧中的目标对象中的某一点。可以将该触发操作对应的视频帧称之为关键视频帧,将在该关键视频帧中的触发操作对应的像素点称之为目标像素点。像素点是图像(例如,视频帧)中的一个个点。如果图像是640×480分辨率的图片,则上面分布着640×480个像素点。通常,图像中的像素点具有空间位置和颜色(或灰度)属性。与此同时,该目标用户终端还可以在独立于该视频播放界面的子窗口中创建一个文本框,以便于该目标用户可以在该文本框中输入与该目标对象具有关联关系的多媒体信息。当目标用户在该文本框中输入多媒体信息之后,该目标用户终端可以获取与该目标对象相关联的多媒体信息,即与该目标对象相关联的多媒体信息可以统称为该目标用户所输入的用户文字,或用户评论。
其中,所述目标用户终端可以为具有视频数据播放功能的终端设备,所述目标用户终端可以为上述图1所对应实施例中的用户终端3000a,该目标用户终端可以理解为一种移动终端。其中,所述应用服务器可以为上述图1所对应实施例中的业务服务器2000。
为便于理解,进一步地,请参见图5,是本申请实施例提供的一种获取多媒体信息的示意图。如图5所示,当目标用户终端在播放上述图3所对应实施例中视频30a的过程中,该目标用户终端可以将当前播放的视频30a作为目标视频。 可以理解的是,目标用户可以在播放该视频30a的任意一个时刻,对该视频30a所包含的多个视频帧中的某个视频帧执行触发操作。目标用户终端可以将该触发操作对应的视频帧作为关键视频帧。例如,如图5所示,目标用户可以在图5所示的视频播放界面100a中选取对象A作为目标对象,则目标用户终端可以将视频播放界面100a当前所播放的视频帧称之为关键视频帧。换言之,该目标用户终端可以将该选取操作(即,触发操作)对应的视频帧作为关键视频帧,并可以将在该关键视频帧中选取操作对应的像素点作为目标像素点。此时,该目标像素点为该目标用户终端所获取到的目标视频内的关键视频帧中的像素点。
如图5所示,当目标用户在图5所示的视频播放界面100a中针对对象A执行触发操作之后,可以在图5所示的视频播放界面200a中弹出图5所示的文本框,该文本框也可以称之为一个对话框。如图5所示的文本框可以理解为独立于该视频播放界面200a的一个浮窗,且图5所示的文本框可以与图5所示的对象A之间存在关联关系(例如,可以与该对象A中的目标像素点之间存在显示位置上的相对关系,以构建得到该视频30a中的目标对象的目标像素点与该目标对象相关联的多媒体信息之间的关联性)。所述浮窗的实现可以与所述视频播放界面的实现类似或相同。应当理解,在本申请实施例的对话框中所输入的多媒体信息可以包含用户文字、用户图片和用户表情等数据,并可以将该目标用户在该对话框中所输入的用户文字(即文本信息)、用户图片(即图片信息)和用户表情(即表情信息)等统称为弹幕数据。所述弹幕数据的显示可以和字幕类似。
因此,当目标用户在图5所示的视频播放界面200a中的文本框中输入文本信息A之后,可以在图5所示的视频播放界面300a中显示该输入的文本信息A。该输入的文本信息A可以为图5所示的与该对象A中的目标像素点之间存在一定位置间隔距离的文本信息。显示在该视频播放界面300a中的文本信息A可以称之为与该目标对象相关联的弹幕数据。
步骤S102,基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求。
具体地,目标用户终端可以在该关键视频帧中确定该目标像素点的位置信息,并可以基于该关键视频帧在目标视频中的帧号以及该目标像素点在该关键视频帧中的位置信息,生成该目标像素点对应的轨迹获取请求,以便于可以进 一步执行步骤S103。
其中,该轨迹获取请求可以用于指示应用服务器从预先计算得到的该目标视频中的所有像素点对应的轨迹信息中筛选与该目标像素点相匹配的轨迹信息。
步骤S103,基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息。
其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的。
在本申请实施例中,目标用户终端可以基于应用服务器预先计算得到的目标视频中的所有像素点在所有视频帧中的运动轨迹(每个像素点的运动轨迹可以统称为一个轨迹信息),从这些像素点对应的运动轨迹中筛选与目标像素点相匹配的像素点的运动轨迹,作为与该目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息。换言之,该目标用户终端在获取到该目标轨迹信息时,可以基于该目标轨迹信息中所包含的所述目标像素点在该关键视频帧的下一视频帧的位置信息,快速确定上述图5所对应实施例中目标对象中的目标像素点与该目标对象相关联的多媒体信息之间的位置间隔距离,或者目标像素点与多媒体信息之间的位置间隔,即可以快速得到该多媒体信息出现在该关键视频帧的下一视频帧中的位置信息。
可以理解的是,该位置间隔距离可以理解为所述关键视频帧中的目标像素点与对应的弹幕数据之间的相对位置间隔距离。即该位置间隔距离可以包含水平(即横向)方向上的相对位置间隔距离,也可以包含垂直(即纵向)方向上的相对位置间隔距离,从而可以确保目标用户终端在得到目标像素点在关键视频帧的下一个视频帧中的位置信息时,可以基于所述相对位置间隔距离,快速计算得到文本信息A在关键视频帧的下一视频帧中的位置信息。即此时,显示在上述图5所对应实施例中的视频播放界面300a中的文本信息A的位置信息,取决于目标像素点在该关键视频帧之后的其他视频帧的位置信息。在目标用户终端的计算性能比较一般的情况下,该目标用户终端可以从与该目标用户终端具有网络连接关系的应用服务器中,获取与该关键视频帧中的目标像素点的位置信息相符的轨迹信息,作为目标轨迹信息,以使该目标用户终端能够在获取 到该应用服务器预先计算好的该目标像素点的目标轨迹信息时,进一步基于该目标轨迹信息中的目标像素点出现在该关键视频帧的下一视频帧的位置信息,在有效时长内快速且准确地实现对该弹幕数据的快速跟踪,从而可以有效地降低该目标用户终端的计算量,以确保在该目标用户终端的计算性能比较一般的情况下,也可以快速对该弹幕数据进行跟踪。
其中,该有效时长可以为弹幕数据对应的显示时长,即该目标用户终端可以在该显示时长内对该目标对象所关联的弹幕数据进行跟踪。
应当理解,由于该目标视频内的每个像素点的运动轨迹(即每个像素点的轨迹信息)是由每个像素点在该目标视频的每个视频帧中的位置信息所确定的。其中,对于包含多个视频帧的目标视频而言,本申请实施例可以将该多个视频帧中任意相邻的两个视频帧确定为一个图像对。应当理解,从该多个视频帧中所确定出的每个图像对所包含的两个视频帧中的一个视频帧,可以称之为第一视频帧,另一个视频帧可以称之为第二视频帧。对于上述图2所对应实施例中的第1时刻所对应的视频帧和第2时刻所对应的视频帧所构成的图像对1而言,可以将该图像对1中的第1时刻所对应的视频帧称之为第一视频帧,并可以将该第2时刻所对应的视频帧称之为第二视频帧,进而可以基于预先计算出的该图像对1中的这两个视频帧之间的平均位移矩阵,对该第一视频帧中所有像素点进行追踪,以确定出该第一视频帧中的所有像素点出现在第二视频帧中的位置信息。同理,对于上述图2所对应实施例中的第2时刻所对应的视频帧和第3时刻所对应的视频帧所构成的图像对2而言,也可以将该第2时刻所对应的视频帧称之为第一视频帧,并将该第3时刻所对应的视频帧称之为第二视频帧,从而可以基于预先计算好的该图像对2中的这两个视频帧之间的平均位移矩阵,对该第一视频帧中所有像素点进行追踪,以确定出该第一视频帧中的所有像素点出现在第二视频帧中的位置信息。以此类推,本申请实施例可以得到每个图像对对应的平均位移矩阵,每个图像对对应的平均位移矩阵可以称之为每个图像对中的第一视频帧对应的平均位移矩阵,每个第一视频帧对应的平均位移矩阵可以用于将第一视频帧中的所有像素点映射到第二视频帧中,以在第二视频帧中准确得到这些映射得到的像素点的位置信息。应当理解,本申请实施例中的平均位移矩阵可以包含纵向平均位移矩阵和横向平均位移矩阵。通过纵向平均位移矩阵可以将第一视频帧中的每个像素点的第一纵向坐标值(例如,y值) 进行纵向坐标变换,以得到相应像素点映射到第二视频帧中的第二纵向坐标;同理,通过横向平均位移矩阵可以将第一视频帧中的每个像素点的第一横向坐标值(例如,x值)进行横向坐标变换,以得到相应像素点映射到第二视频帧中的第二横向坐标。应当理解,本申请实施例可以将每个像素点在第一视频帧中的第一横向坐标和第一纵向坐标值称之为该第一视频帧中的每个像素点的第一位置信息,并可以将每个像素点映射在第一视频帧中的第二横向坐标和第二纵向坐标值称之为该第二视频帧中的每个映射得到的像素点的第二位置信息。由于每个图像对均对应一个平均位移矩阵,从而可以基于在第一视频帧中像素点的第一位置信息计算得到相应的第二位置信息,并可以将计算所得到的每个第二视频帧中所映射得到的像素点的第二位置信息进行保留,进而可以将同一像素点在每个视频帧中的位置信息进行整合,以得到该视频帧中的所有像素点的运动轨迹,从而可以实现对该目标视频的所有视频帧内的所有像素点的跟踪。
应当理解,上述图2所对应实施例所显示的该目标视频中的多个视频可以为多个连续的图像帧,因此,通过将上述图2所示的目标视频进行拆分后,可以对每个拆分所得到的图像帧(即视频帧)按照播放顺序设置相应的视频帧号,例如,上述第1时刻所得到的视频帧的视频帧号可以为1,该视频帧号1则可以用于表示该第1时刻所得到的视频帧为该目标视频中的第一帧;同理,上述第2时刻所得到的视频帧的视频帧号可以为2,该视频帧号2则可以用于表示该第2时刻所得到的视频帧为该目标视频中的第二帧。依次类推,上述第n-1时刻所得到的视频帧的视频帧号可以为n-1,该视频帧号n-1则可以用于表示该第n-1时刻所得到的视频帧为该目标视频中的第n-1帧;上述第n时刻所得到的视频帧的视频帧号可以为n,该视频帧号n则可以用于表示该第n时刻所得到的视频帧为该目标视频中的第n帧,即该目标视频中的最后一帧。
为便于理解,本申请实施例可以将上述图2所示的多个视频帧的第一帧和第二帧所构成的图像对称之为首个图像对,以阐述通过平均位移矩阵将第一帧中的像素点平移变换到第二帧中,实现像素跟踪的具体过程。其中,该首个图像对中的第一帧即为上述图2所对应实施例中的第1时刻对应的视频帧,该首个图像对中的第二帧即为上述图2所对应实施例中的第2时刻对应的视频帧。进一步地,请参见图6,是本申请实施例提供的一种全图像素追踪的示意图。如图6所示的图像对(1,2)可以为前述首个图像对。该首个图像对中的第一视 频帧可以为前述第1时刻对应的视频帧(即第一帧),该首个图像对中的第二视频帧可以为前述第2时刻对应的视频帧(即第二帧)。其中,应当理解,该图像对(1,2)中的数值1即为第一帧的视频帧号,数值2即为第二帧的视频帧号。因此,可以用该目标视频中的每一个视频帧的视频帧号来表征该目标视频中的任意两个前后相邻的视频帧。如图6所示的像素点显示区域600a可以包括从该图像对的第一视频帧中所提取到的所有像素点,比如,该像素点显示区域600a中的每个像素点均可以对应一个区域标识。图6的像素点显示区域600a仅是用于示例,像素点显示区域600a也可称为像素点区域等。应当理解,本申请实施例仅以从该第一视频帧中所获取到的像素点为20个像素点为例,在实际情况下,从该第一视频帧中所获取到的像素点的数量会远远多于本申请实施例所列举的20个。应当理解,由于同一视频内的多个视频帧是由同一终端进行图像采集后所得到的,因此,对于同一视频内所包含的每个视频帧中的像素点的数量是相同的。
如图6所示,在获取到该第一视频帧中的每个像素点之后,可以将这些获取到的所有像素点统称为像素点,进而可以通过图6所示的平均位移矩阵对该像素点显示区域600a中的所有像素点进行跟踪,进而可以在第二视频帧对应的像素点显示区域700a中确定所映射得到的像素点的位置信息。比如,以图6所示的像素点A为例,该像素点A在图6所示的像素点显示区域600a中的位置信息可以为区域标识5的坐标位置信息,通过该平均位移矩阵可以将该像素点A映射到图6所示的像素点显示区域700a中,该像素点A在图6所示的像素点显示区域700a中的位置信息可以为区域标识10的坐标位置信息。本申请实施例在计算得到该像素点A在该第二视频帧中的位置信息之后,可以对其进行存储。由于该目标视频中的每个图像对均可以对应一个平均位移矩阵,因此,可以计算出每个第一视频帧中的像素点映射到第二视频帧中的位置信息。通过将每一个图像对中同一像素点出现在连续视频帧内的位置信息进行整合,可以得到该像素点A出现在该目标视频中的每个视频帧中的位置信息,进而可以基于该像素点A在该目标视频中的每个视频帧中的位置信息,得到该像素点A的运动轨迹。
同理,对于该视频帧中的所有像素点中的其他像素点而言,也可以通过每个图像对对应的平均位移矩阵(即每个图像对中的第一视频帧对应的平均位移 矩阵),确定出其他像素点中的每个像素点在该目标视频的每个视频帧中的位置信息,进而可以得到其他像素点的运动轨迹,以实现对每个图像对中的第一视频帧中的所有像素点的全图像素追踪,进而可以得到该目标视频的所有像素点在每个视频帧中的位置信息。应当理解,本申请实施例可以将该目标视频中所有像素点对应的轨迹信息统称为像素点对应的轨迹信息。
其中,若该目标用户终端的计算性能难以满足跟踪大量像素点的计算要求时,为了减小该目标用户终端的计算量,可以通过与该目标用户终端具有网络连接关系的应用服务器预先计算好该目标视频内的所有像素点的运动轨迹。从而可以在该目标用户终端实际播放该目标视频时,应用服务器接收目标用户终端所发送的目标像素点在关键视频帧中的位置信息,从预先计算好的像素点对应的轨迹信息中筛选与该目标像素点匹配的轨迹信息,作为目标轨迹信息,进而将该目标轨迹信息返回给目标用户终端,以使目标用户终端可以基于该获取到的目标轨迹信息进一步执行步骤S104。其中,该目标像素点为目标用户所选取的关键视频帧内的像素点。
可选地,若该目标用户终端具有良好的计算性能,则可以在该目标用户终端中预先计算好该目标视频内的所有像素点的运动轨迹,从而可以在该目标用户终端实际播放该目标视频时,进一步基于目标用户所选取的目标对象内的目标像素点在这些像素点对应的轨迹信息中筛选与该目标像素点匹配的轨迹信息作为目标轨迹信息,以便于可以进一步执行步骤S104。
步骤S104,当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,显示所述多媒体信息。
为便于理解,进一步地,请参见图7,是本申请实施例提供的一种在连续多个视频帧中跟踪弹幕数据的示意图。应当理解,本申请实施例中用于进行弹幕跟踪的连续多个视频帧可以包含当前正在播放的关键视频帧,以及尚未播放的位于该目标视频内的关键视频帧之后的视频帧。比如,当如图7所示的视频帧10作为关键视频帧时,可以在该关键视频帧后的每个视频帧(例如,视频帧20、视频帧30等视频帧)中,对出现在视频帧10中的弹幕数据进行弹幕跟踪。其中,图7所示的视频帧10可以为上述图5所对应实施例中的显示在视频播放界面300a中的视频帧,即图7所示的视频帧10可以为当前正在目标用户终端中 播放的目标视频中与该目标用户相关联的关键视频帧。换言之,本申请实施例中的关键视频帧可以理解为目标用户选取目标对象时所执行的触发操作对应的视频帧。应当理解,本申请实施例中的目标对象可以包含目标用户在正在播放的视频帧通过点击操作所选取的人物、动物、植物等对象。换言之,该目标用户终端可以将该目标用户所选取的对象称之为目标对象,并可以将该关键视频帧中目标对象内与触发操作对应的像素点作为目标像素点,进而可以从应用服务器预先计算好的该目标视频内的所有像素点的轨迹信息中,获取与该目标对象中目标像素点相关联的轨迹信息,从而可以将获取到的轨迹信息作为该目标像素点对应的目标轨迹信息。该目标轨迹信息中可以包含该目标像素点在该关键视频帧中的位置信息,还可以包含该目标像素点在该关键视频帧之后的每个视频帧(例如,该关键视频帧的下一视频帧)中的位置信息。应当理解,基于该目标像素点在该关键视频帧后的每个视频帧中的位置信息,可以快速计算得到与目标对象关联的(也是与目标像素点关联的)多媒体信息(即上述图5所对应实施例中的弹幕数据)在该关键视频帧后的每个视频帧中的位置信息,以实现对该目标对象相关联的弹幕数据的快速跟踪,从而可以在该目标用户终端播放该关键视频帧的下一视频帧时,基于计算得到的所述弹幕数据在所述关键视频帧中的位置信息,在所述下一视频帧中实时显示该弹幕数据。所述弹幕数据的显示可以类似于字幕的显示。
应当理解,本申请实施例通过将弹幕数据与该关键视频帧中的目标对象进行关联,可以实现弹幕数据与目标对象之间的如影随行,即用户所输入的弹幕可以在有效跟踪时长内,一直跟着这个要跟踪的目标对象进行相对运动。比如,在该目标视频内,若该关键视频帧后的连续多个视频帧中均存在目标对象,则可以基于该目标对象中的目标像素点在这几个连续视频帧中的位置信息,显示与该目标对象相关联的弹幕数据(即前述文本信息A)。
目标用户终端也可以将用户输入的弹幕数据(多媒体信息)以及计算得到的弹幕数据在目标视频各视频帧中的位置信息传输给服务器。或者,服务器可以接收目标用户终端发送的用户点击的目标视频中关键视频帧的帧号、目标像素点坐标、输入的弹幕数据(多媒体信息),计算目标像素点在目标视频各视频帧中的目标轨迹信息,根据该目标轨迹信息,计算弹幕数据在目标视频各视频帧中的位置信息,并将所述弹幕数据的位置信息进行保存。服务器在接收目标 用户终端发送的信息时,还可以接收目标用户终端的标识和/或目标用户终端上用户登录目标应用的用户标识等信息。然后在其他用户终端播放所述目标视频时,服务器可以将所述弹幕数据及其在目标视频各视频帧中的位置信息、用户标识发送给其他用户终端,由其他用户终端根据弹幕数据的位置信息,在目标视频的各视频帧中显示弹幕数据。
应当理解,对于在目标用户终端中播放的任意一个视频而言,目标用户可以在当前时刻为T1时刻时,从当前播放的视频帧中选择出该目标用户认为需要进行跟踪的对象。可以将该选择的对象称为目标对象。进而,目标用户终端可以基于从预先计算好的该视频内的所有像素点对应的轨迹信息,筛选出与该目标对象中的目标像素点相关联的轨迹信息,快速得到该目标对象中的目标像素点对应的目标轨迹信息。其中,应当理解,预先计算好的该视频内的每个像素点的任意一个像素点对应的轨迹信息,均可以用于描述该像素点在该视频内的每个视频帧中的位置信息。因此,当目标用户终端将在T1时刻播放的视频帧作为关键视频帧时,可以在该关键视频帧内得到该目标对象中的目标像素点,进而可以从该目标像素点对应的目标轨迹信息中,快速得到该目标像素点像素点在该关键视频帧后的每个视频帧中的位置信息,从而可以基于该目标轨迹信息显示该目标对象所关联的多媒体信息。其中,可以理解的是,若目标像素点在每个视频帧中所构成的轨迹信息为一个圆,则与该目标对象相关联的多媒体信息则可以同步跟着该轨迹信息转圈。由此,通过预先对该目标视频内的所有像素点进行跟踪,可以预先得到每个像素点对应的轨迹信息,从而可以在目标用户终端中播放该目标视频时,可以基于目标用户所执行的触发操作,将触发操作所对应的目标对象中的像素点作为目标像素点,以获取与该目标像素点相关联的轨迹信息作为目标轨迹信息,进而可以快速基于该获取到的目标轨迹信息,快速实现对该目标对象所关联的多媒体信息的准确跟踪。
那么,对于在该关键视频帧中的不同对象而言,可以得到不同对象中的目标像素点分别对应的运动轨迹,从而使得与不同目标对象相关联的弹幕数据可以绕着不同的轨迹进行运动,使得弹幕数据和其所针对的对象之间的关联更强,进而可以丰富弹幕数据的视觉展示效果,还可以提升弹幕数据的显示方式的灵活度。
本申请实施例可以在获取到目标用户针对目标视频的触发操作时,将该目 标视频中该触发操作对应的视频帧作为关键视频帧,从而可以从该关键视频帧中确定目标像素点,并获取与该目标像素点以及该目标像素点所在目标对象关联的多媒体信息(例如,该多媒体信息可以为该目标视频中的用户文字、图片、表情等弹幕数据)。进一步地,基于该目标像素点在该关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求,进而可以基于该轨迹获取请求,获取所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,以便于在播放该关键视频帧的下一视频帧时,能够基于该目标轨迹信息,显示与该目标像素点以及该目标像素点所在目标对象相关联的弹幕数据。由此可见,本申请实施例可以在确定出关键视频帧时,进一步从该关键视频帧中的所有像素点的轨迹信息中筛选出目标像素点的轨迹信息,并将筛选出的目标像素点的轨迹信息作为目标轨迹信息,以便于能够基于得到的目标轨迹信息丰富弹幕数据的展示效果。比如,对于不同目标对象中的目标像素点,所得到的目标轨迹信息可能会不同,进而使得弹幕数据的展示效果会不同。此外,基于目标对象与弹幕数据之间的关联关系,可以快速确定出该弹幕数据在该关键视频帧之后的每个视频帧中的位置信息,换言之,该弹幕数据会在该目标视频中一直跟着该目标对象进行变动,进而可以丰富视频中的用户文字的视觉展示效果,并可以使得弹幕数据与目标对象或评论的视频中的对象之间的关联性更强。
为便于理解,进一步地,请参见图8,是本申请实施例另一种视频数据处理方法的示意图。该方法主要用于阐述目标用户终端与应用服务器之间的数据交互过程,该方法可以包含以下步骤:
步骤S201,从目标视频中获取相邻的第一视频帧和第二视频帧。
具体地,应用服务器可以从目标视频所包含的多个视频帧中确定多个图像对,该多个图像对中的每个图像对均是由该目标视频中的两个相邻的视频帧所构成的。
换言之,应用服务器可以在对该目标视频进行视频预处理的阶段中,先把该目标视频进行分帧处理,以将该目标视频中的多个视频帧按照播放时间序列分成一张张的图片,即可以得到上述图2所示的基于播放时间序列进行排布的多个视频帧。通过对该目标视频中的每个视频帧进行拆分,可以得到每个视频帧对应的图片,即一个图像可以视为一个图像帧。进一步地,该应用服务器可以通过前向后向光流法,对每个图像对中的两个视频帧内的像素点进行像素追 踪。比如,以包含n个视频帧的目标视频而言,该应用服务器可以根据该目标视频内的每个视频帧的视频帧号,将具有相邻帧号的两个视频帧确定为一个图像对。换言之,该应用服务器可以将视频帧号为1的视频帧和视频帧号为2的视频帧确定为一个图像对。同理,该应用服务器可以将视频帧号为2的视频帧和视频帧号为3的视频帧确定为一个图像对。以此类推,该应用服务器可以将视频帧号为n-1的视频帧和视频帧号为n的视频帧确定为一个图像对。
对于该目标视频内的每个图像对而言,可以用每个视频帧的视频帧帧号来描述相应图像对中的两个视频帧。因此,对于前述包含n个视频帧的目标视频而言,可以得到n-1个图像对,这n-1个图像对可以表示为:(1,2),(2,3),(3,4),…,(n-1,n)。其中,图像对中的视频帧号为1的视频帧可以称之为该目标视频的第一帧,视频帧号为2的视频帧可以称之为该目标视频的第二帧,以此类推,可以将图像对中的视频帧号为n-1的视频帧可以称之为该目标视频的第n-1帧,视频帧号为n的视频帧可以称之为该目标视频的第n帧。进一步地,该应用服务器可以通过云端前向后向光流法,对该目标视频的每个图像对中的像素点进行跟踪。其中,该云端前向后向光流法可以统称为光流法,该光流法可以用于对每个图像对中的两个视频帧之间的像素点位移进行计算。
可以理解的是,由于每个图像对均是由前后相邻的两个视频帧所构成的,因此,可以将每个图像对中的一个视频帧称之为第一视频帧,并将每个图像对中的另一个视频帧称之为第二视频帧,以便于可以进一步执行步骤S202。
其中,为便于理解,本申请实施例可以将从目标视频中所获取到的每个图像对中的两个视频帧,统称为第一视频帧和第二视频帧,即该应用服务器可以从目标视频中获取相邻的第一视频帧和第二视频帧。
步骤S202,基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵。
具体地,应用服务器可以对第一视频帧的所有像素点进行提取。可以将提取到的所有像素点统称为像素点。其中,该目标视频对应的光流追踪规则可以包含前述云端前向后向光流法,还可以包含云端位移积分法和云端位移差分法。应当理解,通过该光流追踪规则可以对每个图像对中的第一视频帧中的像素点、第二视频帧中的像素点进行光流运算,以得到每个图像对对应的光流追踪结果, 进而可以基于该光流追踪结果确定每个图像对对应的目标状态矩阵和目标位移矩阵。如果直接使用目标状态矩阵和目标位移矩阵对视频帧中的像素点进行跟踪也是可以的,但可能跟踪结果不是很准确。为了提高跟踪的准确性,可以在计算像素点在第一视频帧与第二视频帧中的位移时,考虑像素点的周围像素点位移的信息。例如,应用服务器可以针对所述第一视频帧中的每个像素点,以其为中心,选取该像素点周围的图块(包括该像素点和该像素点周围的像素点),计算所述图块中所有像素点的平均位移,作为该像素点的位移。这种处理方式的运算量可能会比较大。根据本申请实施例,通过该光流追踪规则还可以进一步对每个图像对对应的目标状态矩阵和目标位移矩阵进行位移积分运算,以得到每个图像对对应的状态积分矩阵和位移积分矩阵。进一步地,通过该光流追踪规则还可以对每个图像对对应的状态积分矩阵和位移积分矩阵进行位移差分运算,以得到每个图像对对应的平均位移矩阵。换言之,通过该光流追踪规则可以准确得到能够用于对每个图像对中的第一视频帧中的像素点的位置信息进行准确跟踪的平均位移矩阵。通过上述的位移积分运算和位移差分运算,应用服务器能够批量计算第一视频帧中的像素点和第二视频帧中像素点的平均位移,从而可以提高运算的速度,提高像素点和视频帧的处理的效率。
其中,云端前向后向光流法可以用于同步对每个图像对中的第一视频帧和第二视频帧进行正向反向光流法计算,以得到每个图像对对应的光流追踪结果。换言之,应用服务器所得到的光流追踪结果可以包含每个图像对中的第一视频帧对应的正向位移矩阵,还可以包含每个图像对中的第二视频帧对应的反向位移矩阵。在本申请实施例中,正向位移矩阵和反向位移矩阵的中的每个矩阵元素均可以包含两个维度上的位移(比如,(Δx,Δy))。其中,这两个维度上的位移可以理解为同一像素点分别在水平方向上的位移(即Δx)和在垂直方向上的位移(即Δy)。应当理解,对于该目标视频中的每个图像对而言,通过该光流法进行计算后,均可以得到一个正向水平方向位移矩阵、一个正向垂直方向位移矩阵、一个反向水平方向位移矩阵、一个反向垂直方向位移矩阵,并可以将得到的四个矩阵称之为光流结果。进一步地,应用服务器可以为每个图像对中的第一视频帧设置一个初始状态矩阵,进而可以根据前面所得到的正向位移矩阵和反向位移矩阵,判断每个图像对中的第一视频帧中的像素点是否满足目标筛选条件。若是第一视频帧中存在满足目标筛选条件的像素点,则应用服务器可 以将满足目标筛选条件的像素点确定为有效像素点,进而可以根据确定出的有效像素点,对所述第一视频帧对应的初始状态矩阵和所述正向位移矩阵进行修正,以得到每个图像对中的第一视频帧对应的目标状态矩阵和目标位移矩阵。进一步地,应用服务器可以通过上述云端位移积分法和云端位移差分法,以及所得到的目标状态矩阵和目标位移矩阵,确定得到每个图像对中的第一视频帧对应的平均位移矩阵。
其中,本申请实施例可以将该正向水平方向位移矩阵和正向垂直方向位移矩阵统称为正向位移矩阵,并可以将反向水平方向位移矩阵、反向垂直方向位移矩阵统称为反向位移矩阵。为便于理解,本申请实施例以多个图像对中的一个图像对为例,以阐述通过该图像对中的第一视频帧和第二视频帧,得到该图像对对应的平均位移矩阵的过程。其中,该图像对中的第一视频帧可以为上述视频帧号为1的视频帧,第二视频帧可以为上述视频帧号为2的视频帧。因此,由该视频帧号为1的视频帧和视频帧号为2的视频帧所构成的图像对称之为图像对1,且该图像对1可以表示为(1,2)。
其中,通过光流法计算后所得到的该图像对1对应的正向位移矩阵可以包含正向水平位移矩阵(例如,该正向水平位移矩阵可以为矩阵Q 1,2,x)和正向垂直位移矩阵(例如,该正向垂直位移矩阵可以为矩阵Q 1,2,y)。其中,应当理解,矩阵Q 1,2,x中的每个矩阵元素可以理解为第一视频帧中的像素点在第二视频帧中的水平方向上的位移。即该正向水平位移矩阵中的每个矩阵元素可以称之为第一视频帧中的像素点对应的第一横向位移。同理,矩阵Q 1,2,y中的每个矩阵元素可以理解为第一视频帧中的像素点在第二视频帧中的垂直方向上的位移。即该正向水平位移矩阵中的每个矩阵元素可以称之为第一视频帧中的像素点对应的第一纵向位移。换言之,通过光流计算法所得到的这两个矩阵(即矩阵Q 1,2,x和矩阵Q 1,2,x)的矩阵大小与第一视频帧的大小一样,即一个矩阵元素可以对应第一视频帧中的一个像素点。
同理,通过光流法计算后所得到的该图像对1对应的反向位移矩阵,可以包含反向水平位移矩阵(即该反向水平位移矩阵可以为矩阵Q 2,1,x)和反向垂直位移矩阵(即该反向垂直位移矩阵可以为矩阵Q 2,1,y)。其中,应当理解,矩阵Q 2,1,x中的每个矩阵元素可以理解为第二视频帧中的像素点在第一视频帧中的水平方向上的位移。即该反向水平位移矩阵中的每个矩阵元素可以称之为第二视 频帧中的像素点对应的第二横向位移。同理,矩阵Q 2,1,y中的每个矩阵元素可以理解为第二视频帧中的像素点在第一视频帧中的垂直方向上的位移。即该反向垂直位移矩阵中的每个矩阵元素可以称之为第二视频帧中的像素点对应的第二纵向位移。换言之,通过光流计算法所得到的这两个矩阵(即矩阵Q 2,1,x和矩阵Q 2,1,y)的矩阵大小与第二视频帧的大小一样,即一个矩阵元素可以对应第二视频帧中的一个像素点。
应当理解,对于目标视频内的每个视频帧而言,每个视频帧中的像素点的数量是相同的,因此,通过光流计算法所得到的与该图像对1对应的这四个矩阵(即矩阵Q 1,2,x、矩阵Q 1,2,y、矩阵Q 2,1,x、矩阵Q 2,1,y)的矩阵大小是相同的。比如,若每个视频帧中的像素点数为m×n个,则所得到的这四个矩阵的矩阵大小均可以为m×n。由此可见,正向水平位移矩阵和正向垂直位移矩阵中的每个矩阵元素,均能够与第一视频帧中的相应像素点进行对应。因此,该图像对1对应的正向位移矩阵中的每个矩阵元素可以表示第一视频帧中的像素点在第二视频帧中的两个维度上的位移。可以将该图像对1对应的正向位移矩阵统称为第一视频帧对应的正向位移矩阵。同理,图像对1对应的反向位移矩阵中的每个矩阵元素可以表示第二视频帧中的像素点在第一视频帧中的两个维度上的位移。可以将该图像对1对应的反向位移矩阵统称为第二视频帧对应的反向位移矩阵。
由此可见,应用服务器可以基于所述第一视频帧中的像素点的第一位置信息和所述光流追踪规则,将所述第一视频帧中的像素点正向映射到所述第二视频帧,并在所述第二视频帧中确定所映射得到的第一映射点的第二位置信息,并可以进一步基于所述像素点的第一位置信息、所述第一映射点的第二位置信息,确定所述第一视频帧对应的正向位移矩阵。进一步地,应用服务器可以基于所述第二视频帧中的像素点的第二位置信息和所述光流追踪规则,将所述第二视频帧中的像素点反向映射到所述第一视频帧,并在所述第一视频帧中确定所映射得到第二映射点的第三位置信息,并进一步基于所述第一映射点的第二位置信息、所述第二映射点的第三位置信息,确定所述第二视频帧对应的反向位移矩阵。其中,所述第一映射点和第二映射点均为通过光流法将图像对中的一个视频帧中的像素点映射到另一个视频帧中所得到的像素点。
进一步地,应用服务器可以基于所述第一视频帧中的像素点的第一位置信 息、所述正向位移矩阵、所述反向位移矩阵,将所述像素点中满足目标筛选条件的像素点确定为有效像素点。其中,该应用服务器确定有效像素点的具体过程可以描述为:
应用服务器可以从所述第一视频帧中的像素点中获取第一像素点,并在所述第一视频帧中确定所述第一像素点的第一位置信息,并从所述正向位移矩阵中确定所述第一像素点对应的第一横向位移和第一纵向位移;进一步地,应用服务器可以基于所述第一像素点的第一位置信息、所述第一像素点对应的第一横向位移和第一纵向位移,将所述第一像素点正向映射到所述第二视频帧,并在所述第二视频帧中确定所映射得到的第二像素点的第二位置信息;进一步地,应用服务器可以从所述反向位移矩阵中确定所述第二像素点对应的第二横向位移和第二纵向位移,并基于所述第二像素点的第二位置信息、所述第二像素点对应的第二横向位移和第二纵向位移,将所述第二像素点反向映射到所述第一视频帧,并在所述第一视频帧中确定所映射得到的第三像素点的第三位置信息;进一步地,应用服务器可以基于所述第一像素点的第一位置信息、所述第三像素点的第三位置信息,确定所述第一像素点与所述第三像素点之间的误差距离,并根据所述第一像素点的第一位置信息、所述第二像素点的第二位置信息,确定包含第一像素点的图像块与包含所述第二像素点的图像块之间的相关系数;进一步地,应用服务器可以将所述像素点中误差距离小于误差距离阈值、且所述相关系数大于相关系数阈值的像素点确定为有效像素点。
为了验证通过上述得到的光流跟踪结果中的正向位移矩阵和反向位移矩阵中的矩阵元素的正确性,本申请实施例可以通过矩阵变换的方式对上述四个位移矩阵中的矩阵元素进行筛选,即可以通过所构建的初始状态矩阵中的相应像素点所在位置上的矩阵元素的变化情况,从这四个矩阵中去除相应像素点对应位置上的、存在较大位移误差的矩阵元素,即可以从第一视频帧的像素点中确定出有效像素点。
为便于理解,进一步地,请参见图9,是本申请实施例提供的一种确定有效像素点的方法。如图9所示,该应用服务器在对这四个矩阵中的矩阵元素进行筛选之前,可以首先初始化一个与该第一视频具有相同大小的状态矩阵S 1。此时,该应用服务器可以将该状态矩阵S 1称为初始状态矩阵。其中,该初始状态矩阵中,与各像素点对应的矩阵元素的值可以称为第一数值。此时,该初始状 态矩阵中的第一数值均为零。该初始状态矩阵中的矩阵元素的值的变化情况可以用来表示第一视频帧中的像素点是否满足目标筛选条件,从而可以将满足目标筛选条件的像素点作为有效跟踪像素点(即有效像素点)。
其中,如图9所示的第一图像帧可以为上述图像对1中的视频帧号为1的视频帧。该第一视频帧中的像素点中可以包含如图9所示的第一像素点p1,即该第一像素点p1可以为该第一视频帧的所有像素点中的一个像素点,并可以将该第一像素点p1在第一视频帧中的位置信息称为第一位置信息。进一步地,应用服务器可以从上述正向位移矩阵中的正向水平位移矩阵中找到该第一像素点p1对应的第一横向位移,并从上述正向位移矩阵中的正向垂直位移矩阵中找到该第一像素点p1对应的第一纵向位移,进而可以基于该第一像素点p1的第一位置信息、该第一像素点p1对应的第一横向位移和第一纵向位移,将该第一像素点p1正向映射到图9所示的第二视频帧,并在第二视频帧中确定出所映射得到的第二像素点p2的第二位置信息。可以理解的是,此时,该第二像素点p2是通过将第一像素点p1进行矩阵变换后所得到的像素点。进一步地,该应用服务器可以从上述反向位移矩阵中确定该第二像素点p2对应的第二横向位移和第二纵向位移,并可以根据该第二像素点p2的第二位置信息、该第二像素点p2对应的第二横向位移和第二纵向位移,将该第二像素点p2反向映射回图9所示的第一视频帧,并可以在该第一视频帧中确定所映射得到的第三像素点p1’的第三位置信息。可以理解的是,此时,该第三像素点p1’是通过将由第一像素点p1映射所得到的第二像素点p2进行矩阵变换后所得到的像素点。
进一步地,该应用服务器可以在第一视频帧中,确定出第一像素点p1的第一位置信息和矩阵变换后所得到的第三像素点p1’的第三位置信息这两个位置之间的位置误差t 11’。进一步地,应用服务器可以在图9所示的第一视频帧中以第一像素点p1的第一位置信息为中心,选取一个k*k像素(例如,8*8像素)大小的图像块10。另外,如图9所示,应用服务器可以在图9所示的第二视频帧中以第二像素点p2的第二位置信息为中心,同样选取一个k*k像素大小的图像块20,进而可以计算这两个图像块之间的相关系数(该相关系数可以为N 1,2)。
其中,相关系数N 1,2的计算表达式如下:
Figure PCTCN2020084112-appb-000001
在公式(1)中的patch 1(a,b)可以表示图9所示的图像块10的第a行b列位置上的像素点的像素值。所述像素值可以是像素的灰度值,在0~255之间。E(patch 1)表示图9所示的图块10的平均像素值。patch 2(a,b)表示图9所示的图像块20的第a行b列位置上的像素点的像素值。E(patch 2)表示图9所示的图像块20的平均像素值。
其中,可以理解的是,本申请实施例在计算得到图9所示的第一像素点p1和第三像素点p1’之间的误差距离之后,可以将该计算得到的误差距离与预设的误差距离进行比较,如果t 11'<T B且N 1,2>=T A(其中,T B为设定的误差距离阈值,T A为设定的相关系数阈值),则表示该第一视频帧中的第一像素点p1满足前述目标筛选条件,从而可以确定该第一像素点p1是有效像素点。
进一步地,该应用服务器可以将与该第一像素点p1对应的初始状态矩阵S 1中对应位置上的矩阵元素的值设置为第二数值。例如,可以将该初始状态矩阵S 1中与第一像素点p1对应的元素的值由0切换为1,以表示第一视频帧中的第一像素点p1为有效像素点。反之,如果t 11'>=T B和/或N 1,2<T A,则表示该第一视频帧中的第一像素点p1不满足前述目标筛选条件。此时,该应用服务器可以判断上述图9所示的第一像素点p1为无效跟踪像素点。即在该初始状态矩阵S 1中与第一像素点p1对应的元素的值仍然为0。与此同时,该应用服务器可以进一步对上述正向位移矩阵(即上述矩阵Q 1,2,x和矩阵Q 1,2,y)中与该第一像素点p1对应位置上的矩阵元素的值设置为0,从而可以将该包含所述第一数值的正向位移矩阵确定为目标位移矩阵(例如,正向水平位移矩阵Q x1和正向水平位移矩阵Q y1)。即该目标位移矩阵中的这些位置上的矩阵元素可以用于表示从上述正向位移矩阵中所筛选出、并过滤掉存在较大误差的误跟踪位移之后所确定的矩阵。
可以理解的是,对于上述图9所示的其他像素点而言,可以依次从图9所示的第一视频帧中选取像素点作为第一像素点,以重复上面确定有效像素点的步骤,直到该第一视频帧中的所有像素点都作为第一像素点之后,可以确定出该第一视频帧中的所有有效像素点。从而可以基于该有效像素点在该初始状态矩阵中的位置信息对该初始状态矩阵中的矩阵元素进行更新,进而可以将包含 第二数值的初始状态矩阵确定为所述第一视频帧对应的目标状态矩阵S 1。并可以得到该第一视频帧对应的目标位移矩阵(即目标水平位移矩阵Q x,1和目标水平位移矩阵Q y,1)。同理,对于多个图像对中的其他图像对而言,通过重复上面从图像对1中确定有效像素点的步骤,也可以得到剩余图像对中的每一个图像对中的第一视频帧对应的目标状态矩阵、目标位移矩阵。比如,以目标视频内的视频帧号分别为1、2、3、4、···、n的连续多个视频帧为例,则组成的多个图像对可以分别表示为(1,2)、(2,3)、(3,4)、···、(n-1,n)。那么,根据每个图像对对应的光流追踪结果而言,可以通过上述有效像素点的判断方式最后得到图像对(1,2)对应的目标状态矩阵S 1和图像对(1,2)对应的目标位移矩阵Q 1(即前述目标水平位移矩阵Q x,1和目标垂直位移矩阵Q y,1)。依次类推,可以得到图像对(n-1,n)对应的目标状态矩阵S n-1和图像对(1,2)对应的目标位移矩阵Q n-1(即前述目标水平位移矩阵Q x,n-1和目标水平位移矩阵Q y,n-1)。
进一步地,应用服务器可以通过云端位移积分法对上述图像对1对应的目标状态矩阵S 1和所述目标位移矩阵Q 1进行积分运算,以得到该第一视频帧中的像素点对应的状态积分矩阵S in(x,y)和位移积分矩阵Q in(x,y)。其中,位移积分矩阵Q in(x,y)可以包含横向位移积分矩阵Q x,in(x,y)和纵向位移积分矩阵Q y,in(x,y)。且该状态积分矩阵S in(x,y)、横向位移积分矩阵Q x,in(x,y)和纵向位移积分矩阵Q y,in(x,y)可以通过如下矩阵积分公式得到:
Figure PCTCN2020084112-appb-000002
Figure PCTCN2020084112-appb-000003
Figure PCTCN2020084112-appb-000004
在公式(2)、公式(3)、公式(4)中的x和y可以用于表示与第一视频帧对应的状态积分矩阵、位移积分矩阵中的所有矩阵元素的坐标,比如S in(x,y)可以表示状态积分矩阵的第x行y列的矩阵元素的值。另外,在公式(2)、公式(3)、公式(4)中的x’和y’可以表示目标状态矩阵和目标位移矩阵中的矩阵元素的坐标,比如S(x’,y’)可以表示目标状态矩阵的第x’行y’列的矩阵元素的值。
进一步地,应用服务器可以通过云端位移差分方法,在第一视频帧中选取一个高度为M和宽度为N的目标框,进而可以在该目标框内进一步对由公式 (2)、公式(3)、公式(4)所得到的这三个积分矩阵进行位移差分运算,以分别得到状态差分矩阵S dif(x,y)和位移差分矩阵Q dif(x,y)。所述目标框,是为了选取像素点周围一定区域的所有像素点,来计算平均位移。例如为80×80像素的大小。
其中,位移差分矩阵Q dif(x,y)可以包含横向位移差分矩阵Q x,dif(x,y)和纵向位移积分矩阵Q y,dif(x,y)。该状态积分矩阵S dif(x,y)、横向位移差分矩阵Q x,dif(x,y)和纵向位移积分矩阵Q y,dif(x,y)可以通过如下矩阵差分公式(5)得到。
Figure PCTCN2020084112-appb-000005
Figure PCTCN2020084112-appb-000006
Figure PCTCN2020084112-appb-000007
其中,可以理解的是,本申请实施例可以将该第一视频帧中该目标框所在的区域称之为差分区域,从而可以基于该差分区域的尺寸信息,状态积分矩阵、横向位移积分矩阵和纵向位移积分矩阵,确定所述第一视频帧对应的平均位移矩阵。其中,即该位移差分运算公式中的M和N即为该差分区域的长度值和宽度值。其中,该位移差分运算公式中的x和y分别为第一视频帧中的每个像素点的位置信息。通过该位移差分运算公式,可以快速得到该该差分区域内的所有像素点的平均值。比如,对于状态积分矩阵而言,可以得到该状态积分矩阵S in(x,y)对应的状态差分矩阵S dif(x,y)。而对于横向位移积分矩阵Q x,in(x,y)和纵向位移积分矩阵Q y,in(x,y)而言,可以得到横向位移差分矩阵Q x,dif(x,y)和纵向位移差分矩阵Q y,dif(x,y)。
进一步地,该应用服务器可以将横向位移差分矩阵Q x,dif(x,y)与所述状态差分矩阵S dif(x,y)之间的比值确定为横向平均位移矩阵Q x,F(x,y),并将所述纵向位移差分矩阵Q y,in(x,y)与所述状态差分矩阵S dif(x,y)之间的比值确定为纵向平均位移矩阵Q y,F(x,y)。
Figure PCTCN2020084112-appb-000008
Figure PCTCN2020084112-appb-000009
在公式(6)和公式(7)中的e用于表示一个人为设定的比较小的数字, 比如0.001。即在公式(6)和公式(7)中的e是为了避免当状态差分矩阵S dif(x,y)中的所有矩阵元素的值为0时,从而可以避免直接除以0,进而可以进一步执行步骤S203,以在该目标用户终端中预先计算出该第一视频帧中的像素点出现在第二视频帧中的位置信息。
步骤S203,基于所述平均位移矩阵,对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息。
具体地,应用服务器可以基于前述步骤S203所得到的平均位移矩阵(该平均位移矩阵可以包含横向平均位移矩阵Q x,F(x,y)和纵向平均位移矩阵Q y,F(x,y)),进一步对该第一视频帧中的像素点出现在下一视频帧(即上述图像对1中的第二视频帧)中的位置信息进行快速、且准确地跟踪,即通过进行位移变换,可以在所述第二视频帧中确定出对该第一视频帧中的像素点进行跟踪所得到的像素点的位置信息。
C x(x,y)=x+Q x,F(x,y)   公式(8)
C y(x,y)=y+Q y,F(x,y)   公式(9)
其中,公式(8)中的x为该第一视频帧中的像素点的横向位置坐标,Q x,F(x,y)为该第一视频帧对应的横向平均位移矩阵,通过该公式(8)可以将第一视频帧中的像素点的横向位置坐标进行坐标变换,以得到该第一视频帧中的像素点的下一视频帧中的横向位置坐标。同理,公式(9)中的y为该第一视频帧中的像素点的纵向位置坐标,Q x,y(x,y)为该第一视频帧对应的纵向平均位移矩阵,通过该公式(9)可以将第一视频帧中的像素点的纵向位置坐标进行坐标变换,以得到该第一视频帧中的像素点的下一视频帧中的纵向位置坐标。
可以理解的是,对于每个图像对而言,通过每个图像对中的第一视频帧对应的平均位移矩阵,都可以将相应图像对中的第一视频帧中的像素点进行快速追踪,从而可以在相应图像对中的第二视频帧中确定所追踪得到的像素点的位置坐标,即可以在每个图像对的第二视频帧中确定所跟踪得到的像素点的位置信息。该应用服务器可以进一步对将每个图像对中所跟踪得到的像素点的位置信息进行存储,以便于可以进一步执行步骤S204。
步骤S204,基于所述第一视频帧中的像素点的位置信息、所述跟踪得到的像素点在所述第二视频帧中的位置信息,生成所述目标视频相关联的轨迹信息。
其中,所述轨迹信息中包含用于对目标视频中的目标对象所关联的多媒体 信息进行跟踪显示的目标轨迹信息。
步骤S205,响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息。
步骤S206,基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求。
其中,步骤S205和步骤S206的具体实现方式可以参见上述图4所对应实施例中对目标用户终端的描述,这里将不再继续进行赘述。
步骤S207,响应于对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息。
具体地,应用服务器可以接收目标用户终端基于关键视频帧中的目标像素点发送的轨迹获取请求,并可以进一步获取上述应用服务器所预先计算得到的与该目标视频中的所有像素点相关联的轨迹信息,以便于进一步执行步骤S208。
步骤S208,从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息。
具体地,应用服务器可以从该轨迹获取请求中得到该关键视频帧在目标视频中的视频帧号和目标像素点在该关键视频帧中的位置信息,从而可以进一步从该应用服务器预先得到的与该目标视频相关联的轨迹信息中筛选出与该目标像素点相关联的轨迹信息,并可以将筛选得到的轨迹信息称之为目标轨迹信息,从而可以进一步将该目标轨迹信息返回给目标用户终端,以使该目标用户终端可以基于该关键视频帧的帧号,快速从该接收到的目标轨迹信息中找出目标像素点出现在该关键视频帧的下一视频中的位置信息,直到得到该目标像素点出现在该关键视频帧后的每个视频帧中的位置信息,此时,该目标用户终端可以将该目标像素点出现在该关键视频帧后的每个视频帧中的位置信息所构成的新的轨迹信息。可选地,应当理解,该应用服务器在得到该关键视频帧的帧号时,可以快速从该筛选出的轨迹信息中找出目标像素点出现在该关键视频帧的下一视频中的位置信息,直到得到该目标像素点出现在该关键视频帧后的每个视频帧中的位置信息,此时,该应用服务器可以将该目标像素点出现在该关键视频帧后的每个视频帧中的位置信息所构成的新的轨迹信息称之为目标轨迹信息。
其中,可以理解的是,若目标用户终端的计算性能无法满足对象跟踪需求, 则该目标用户终端可以在生成该目标像素点对应的轨迹获取请求时,可以将该轨迹获取请求发送至应用服务器,以使应用服务器可以基于所述轨迹获取请求获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并可以将获取到的目标轨迹信息返回给该目标用户终端;
可选地,若目标用户终端的计算性能能够满足对象跟踪需求,则该目标用户终端可以在该目标用户终端中执行上述步骤S201-步骤S204,以在该目标用户终端中预先对该目标视频内的所有像素点的进行全图像素跟踪,以预先得到该目标视频内的所有像素点在每个视频帧中的位置信息,进而可以对该目标视频内的每个像素点在每个视频帧中的位置信息进行位置整合,以得到该目标视频内的每个像素点对应的轨迹信息。此时,该目标用户终端可以直接基于所述目标对象中的目标像素点在所述关键视频帧中的位置信息,在该目标用户终端中获取与该目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,以便于可以进一步执行步骤S209。
其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息;所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标对象得到的。
比如,以当前正在播放的视频(即目标视频)所包含的多个连续视频帧为以下6个视频帧为例,这6个视频帧可以为视频帧a、视频帧b、视频帧c、视频帧d、视频帧e、视频帧f。因此,应用服务器在视频预处理阶段,可以预先对该目标视频内的每个视频帧进行预处理,即可以基于上述光流追踪规则确定该目标视频内的每两个相邻视频帧所构成的图像对对应的平均位移矩阵,进而可以根据每个图像对对应的平均位移矩阵(也可以称之为根据每个图像对中第一视频帧对应的平均位移矩阵)对该第一视频帧中的所有像素点进行跟踪,以得到该第一视频帧中的所有像素点出现在第二视频帧中的位置信息,进而可以得到该目标视频的所有像素点在每个视频帧(即可以得到该目标视频的所有像素点在上述视频帧a、视频帧b、视频帧c、视频帧d、视频帧e、视频帧f)中的位置信息,即可以基于该目标视频的所有像素点在每个视频帧中的位置信息,得到该目标视频的所有像素点对应的轨迹信息。该目标视频的所有像素点对应的轨迹信息称之为与该目标视频相关联的轨迹信息。
又比如,应用服务器在对该目标视频进行预处理时,该应用服务器可以预 先计算得到的该目标视频中的像素点A(例如,该像素点A可以为该该目标视频中的所有像素点中的一个像素点)对应的轨迹信息,若该像素点A对应的轨迹信息中包含该像素点A在该目标视频的每个视频帧(即在上述视频帧a、视频帧b、视频帧c、视频帧d、视频帧e、视频帧f)中的位置信息,则在目标用户终端中的目标像素点对应的关键视频帧为该目标视频的视频帧c时,可以进一步将该视频帧c中的目标对象内的像素点A作为目标像素点,进而可以在该目标像素点为像素点A时,可以从应用服务器中筛选出该像素点A的轨迹信息,进而可以基于该筛选出的像素点A的轨迹信息得到该像素点A在该关键视频帧后的每个视频帧(即视频帧d、视频帧e、视频帧f)中的位置信息。
在本申请实施例中,若关键视频帧为该目标视频内的首个视频帧,则目标用户终端所获取到的目标轨迹信息可以为预先计算得到的轨迹信息。例如,以前述关键视频帧为目标视频内的视频帧a(首个视频帧)为例,则该目标用户终端所获取到的目标轨迹信息可以包含该目标像素点在上述视频帧a、视频帧b、视频帧c、视频帧d、视频帧e、视频帧f的位置信息。可选地,若关键视频帧为该目标视频内的非首个视频帧,则目标用户终端所获取到的目标轨迹信息可以为从预先计算得到的轨迹信息中所确定出的部分位置信息所构成的轨迹信息,例如,以前述关键视频帧为目标视频内的视频帧c(即非首个视频帧)为例,则该目标用户终端所获取到的目标轨迹信息可以包含该目标像素点在视频帧d、视频帧e、视频帧f中的位置信息,并可以将目标像素点在视频帧d、视频帧e、视频帧f中的位置信息称之为部分位置信息。
可以理解的是,可选地,该目标用户终端还可以将该应用服务器所查找到的包含该目标像素点的在关键视频帧中的位置信息的轨迹信息(即上述像素点A对应的轨迹信息)统称为目标轨迹信息,此时,该目标轨迹信息可以视为从该目标视频的所有像素点中所查找到与目标像素点相匹配的像素点A对应的轨迹信息。由于该轨迹信息中可以包含该像素点A在该目标视频的每个视频帧中的位置信息,自然,也可以快速从该轨迹信息中得到该目标像素点在该关键视频后的每个视频帧中的位置信息。
步骤S209,当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息显示所述多媒体信息。
其中,可以理解的是,本申请实施例可以在获取到目标用户所选取的目标对象中的目标像素点时,可以从这些预先计算好的所有像素点对应的轨迹信息中筛选出该关键视频帧中的目标像素点的位置信息相关联的轨迹信息,进而可以将筛选出的轨迹信息称之为目标轨迹信息。由于本申请实施例可以预先对该视频内的每个视频帧中的像素点进行像素跟踪,因此,在获取到每个图像对中的第一视频帧对应的平均位移矩阵时,可以快速得到该视频内的每个像素点在相应视频帧中的位置信息。应当理解,预先计算得到的每个像素点在相应视频帧中的位置信息,可以用于表征在当前视频播放界面中所播放的该视频内的每个像素点在相应视频帧中的位置信息。因此,当在目标用户终端获取到目标对象中的目标像素点以及目标对象相关联的多媒体信息时,可以快速将从所有像素点对应的轨迹信息中所筛选出的该目标像素点对应的轨迹信息称之为目标轨迹信息,进而可以将该目标轨迹信息返回给目标用户终端,以使该目标用户终端可以基于该目标轨迹信息中所携带的该目标像素点在关键视频帧之后的每个视频帧中的位置信息,对该目标对象所关联的多媒体信息(例如,弹幕数据)进行跟踪显示。
可以理解的是,若该目标像素点在该关键视频帧后的连续多个视频帧中的位置信息所构成的轨迹为一个圈,则该弹幕数据在该目标用户终端中可以跟踪这个轨迹进行转圈显示。
为便于理解,请参见图10,是本申请实施例提供的一种基于轨迹信息显示弹幕数据的示意图。如图10所示的视频帧100中可以包含多个对象,例如,可以包含图10所示的对象1、对象2和对象3。若目标用户在目标用户终端中将图10所示的对象1作为目标对象时,可以将该视频帧100称之为关键视频帧,并可以将该目标对象中与该目标用户所执行的触发操作对应的像素点称为目标像素点。若该目标用户终端具有较强的计算性能时,则可以在该目标用户终端中预先计算好该目标视频中的每个像素点在每个视频帧中的位置信息,从而可以在该目标用户终端中得到与该目标视频相关联的轨迹信息。例如,可以得到预先计算得到图10所示的轨迹信息1,即该轨迹信息1中的位置信息均是由该目标视频中的像素点在该目标视频中的每个视频帧中的位置信息所确定的。因此,该目标用户终端可以快速从基于该对象1中的目标像素点的位置信息,将图10所示的轨迹信息1视为目标轨迹信息,从而可以基于该轨迹信息1中的对 象1在该关键视频帧后的每个视频帧(即图10所示的视频帧200和视频帧300)中的位置信息,快速对该目标对象(即对象1)所关联的多媒体信息(即图10所示的弹幕数据1为BBBBB)进行跟踪和显示。即显示在图10所示的视频帧200和视频帧300中的弹幕数据均是由图10所示的轨迹信息系1中的位置信息所决定的。
可以理解的是,图10所示的目标视频相关联的轨迹信息还可以为上述应用服务器所预先计算好的,从而可以在应用服务器接收到上述针对对象1中的目标像素点轨迹获取请求时,也可以快速从图10所示的目标视频相关联的轨迹信息中获取该关键视频帧中的目标像素点的位置信息相关联的轨迹信息,即通过将对目标视频中的所有像素点的全图像素跟踪放在应用服务器中执行,可以有效地减小目标用户终端的计算量,从而可以确保目标用户终端在得到图10所示的轨迹信息1时,可以基于该轨迹信息1中的位置信息对图10所示的弹幕数据1进行快速跟踪和显示,从而可以提升弹幕数据的显示的灵活度。应当理解,视频播放界面中可以存在与该目标对象相关联的多个弹幕,但是当目标用户终端检测到该视频播放界面中所存在的多条弹幕存在重合时,可以对这些存在重合的弹幕进行合并,以在这些存在重合的弹幕中保留该目标用户终端最新得到的弹幕。
本申请实施例可以在获取到目标用户针对目标视频的触发操作时,将该目标视频中该触发操作对应的视频帧称之为关键视频帧,从而可以从该关键视频帧中确定目标像素点,并获取与该目标像素点以及该目标像素点所在目标对象关联的多媒体信息(例如,该多媒体信息可以为该目标视频中的用户文字、图片、表情等弹幕数据);进一步地,基于目标像素点在该关键视频帧中的位置信息确定所述目标像素点对应的轨迹获取请求,进而可以基于该轨迹获取请求获取所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,以便于在播放该关键视频帧的下一视频帧时,能够基于该目标轨迹信息显示与该目标像素点以及该目标像素点所在目标对象相关联的弹幕数据。由此可见,本申请实施例可以在确定出关键视频帧时,进一步从该关键视频帧中的所有像素点的轨迹信息中筛选出目标像素点的轨迹信息,并将筛选出的目标像素点的轨迹信息称之为目标轨迹信息,以便于能够基于得到的目标轨迹信息丰富弹幕数据的展示效果,比如,对于不同目标对象中的目标像素点,所得到的目标轨迹 信息可能会不同,进而使得弹幕数据的展示效果会不同。此外,基于目标对象与弹幕数据之间的关联关系,可以快速确定出该弹幕数据在该关键视频帧之后的每个视频帧中的位置信息,换言之,该弹幕数据会在该目标视频中一直跟着该目标对象进行变动,进而可以丰富视频中的用户文字的视觉展示效果,并可以使得弹幕数据与目标对象或评论的视频中的对象之间的关联性更强。
进一步地,请参见图11,是本申请实施例提供的一种视频数据处理装置的结构示意图。如图11所示,该视频数据处理装置1可以应用于上述图1所对应实施例中的目标用户终端。该视频数据处理装置1可以包含:对象确定模块1101,请求确定模块1102,轨迹获取模块1103,文本显示模块1104;
对象确定模块1101,用于响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点;
请求确定模块1102,用于基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求;
轨迹获取模块1103,用于基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的;
文本显示模块1104,用于当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,显示所述多媒体信息。
其中,对象确定模块1101,请求确定模块1102,轨迹获取模块1103,文本显示模块1104的具体执行方式可参见上述图4所对应实施例中对步骤S101-步骤S104的描述,这里将不再继续进行赘述。
本申请实施例可以在获取到目标用户针对目标视频的触发操作时,将该目标视频中该触发操作对应的视频帧作为关键视频帧,从而可以从该关键视频帧中确定目标像素点,并获取与该目标像素点以及该目标像素点所在目标对象关联的多媒体信息(例如,该多媒体信息可以为该目标视频中的用户文字、图片、 表情等弹幕数据)。进一步地,基于目标像素点在该关键视频帧中的位置信息确定所述目标像素点对应的轨迹获取请求,进而可以基于该轨迹获取请求获取所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,以便于在播放该关键视频帧的下一视频帧时,能够基于该目标轨迹信息显示与该目标像素点以及该目标像素点所在目标对象相关联的弹幕数据。由此可见,本申请实施例可以在确定出关键视频帧时,进一步从该关键视频帧中的所有像素点的轨迹信息中筛选出目标像素点的轨迹信息,并将筛选出的目标像素点的轨迹信息作为目标轨迹信息,以便于能够基于得到的目标轨迹信息丰富弹幕数据的展示效果。比如,对于不同目标对象中的目标像素点,所得到的目标轨迹信息可能会不同,进而使得弹幕数据的展示效果会不同。此外,基于目标对象与弹幕数据之间的关联关系,可以快速确定出该弹幕数据在该关键视频帧之后的每个视频帧中的位置信息,换言之,该弹幕数据会在该目标视频中一直跟着该目标对象进行变动,进而可以丰富视频中的用户文字的视觉展示效果,并可以使得弹幕数据与目标对象或评论的视频中的对象之间的关联性更强。
进一步地,请参见图12,是本申请实施例提供的一种计算机设备的结构示意图。如图12所示,计算机设备1000可以为上述图1对应实施例中的目标用户终端。上述计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005。此外,上述计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard)。可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图12所示,作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及设备控制应用程序。
在图12所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标 像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点;
基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求;
基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息;所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息;所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的;
当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,显示所述多媒体信息。
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图4所对应实施例中对上述视频数据处理方法的描述,也可执行前文图11所对应实施例中对上述视频数据处理装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机存储介质,且上述计算机存储介质中存储有前文提及的视频数据处理装置1所执行的计算机程序,且上述计算机程序包括程序指令,当上述处理器执行上述程序指令时,能够执行前文图4所对应实施例中对上述视频数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
进一步地,请参见图13,是本申请实施例提供的另一种视频数据处理装置的结构示意图。如图13所示,该视频数据处理装置2可以应用于上述图8所对应实施例中的应用服务器,该应用服务器可以为上述图1所对应实施例中的业务服务器2000。该视频数据处理装置2可以包含:请求响应模块1301和轨迹筛选模块1302;
请求响应模块1301,用于响应于对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息,其中,所述关键视频帧是所述目标 视频中的视频帧,所述目标像素点是所述关键视频帧中的像素点,所述轨迹信息是由所述目标视频中的每个视频帧中的像素点位置信息所确定的;
轨迹筛选模块1302,用于从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息;所述目标轨迹信息包含目标位置信息;所述目标位置信息用于触发在所述关键视频帧的下一视频帧中显示与所述目标像素点相关联的多媒体信息。
其中,请求响应模块1301和轨迹筛选模块1302的具体实现方式可以参见上述图8所对应实施例中对步骤S207和步骤S208的描述,这里将不再继续进行赘述。
进一步地,请参见图14,是本申请实施例提供的另一种计算机设备的结构示意图。如图14所示,计算机设备2000可以为上述图1对应实施例中的目标业务服务器2000。上述计算机设备2000可以包括:处理器2001,网络接口2004和存储器2005。此外,上述计算机设备2000还可以包括:用户接口2003,和至少一个通信总线2002。其中,通信总线2002用于实现这些组件之间的连接通信。其中,用户接口2003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口2003还可以包括标准的有线接口、无线接口。网络接口2004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器2004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器2005可选的还可以是至少一个位于远离前述处理器2001的存储装置。如图14所示,作为一种计算机存储介质的存储器2005中可以包括操作***、网络通信模块、用户接口模块以及设备控制应用程序。
在图14所示的计算机设备2000中,网络接口2004可提供网络通讯功能;而用户接口2003主要用于为用户提供输入的接口;而处理器2001可以用于调用存储器2005中存储的设备控制应用程序,以实现:
响应于对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息,其中,所述关键视频帧是所述目标视频中的视频帧,所述目标像素点是所述关键视频帧中的像素点,所述轨迹信息是由所述目标视频的每个视频帧中的像素点位置信息所确定的;
从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键 视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息;所述目标轨迹信息包含目标位置信息;所述目标位置信息用于触发在所述关键视频帧的下一视频帧中显示与所述目标像素点相关联的多媒体信息。
应当理解,本申请实施例中所描述的计算机设备2000可执行前文图8所对应实施例中对上述视频数据处理方法的描述,也可执行前文图13所对应实施例中对上述视频数据处理装置2的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机存储介质,且上述计算机存储介质中存储有前文提及的视频数据处理装置2所执行的计算机程序,且上述计算机程序包括程序指令,当上述处理器执行上述程序指令时,能够执行前文图8所对应实施例中对上述视频数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
进一步地,请参见图15,是本申请实施例提供的又一种视频数据处理装置的结构示意图。如图15所示,该视频数据处理装置3可以应用于上述图1所对应实施例中的业务服务器2000,也可以应用于上述图1所对应实施例中的目标用户终端。该视频数据处理装置3可以包含:第一获取模块310、矩阵获取模块410、位置跟踪模块510、轨迹生成模块610;
第一获取模块310,用于从目标视频中获取相邻的第一视频帧和第二视频帧;
矩阵获取模块410,用于基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵;
其中,所述矩阵获取模块410包括:第一确定单元4001、矩阵确定单元4002、像素点筛选单元4003、矩阵修正单元4004、第二确定单元4005;
第一确定单元4001,用于获取所述目标视频对应的光流追踪规则,并将所述第一视频帧中的像素点的位置信息确定为第一位置信息,并将所述第二视频帧中的像素点的位置信息确定为第二位置信息;
矩阵确定单元4002,用于基于所述光流追踪规则、所述第一视频帧中的像 素点的第一位置信息、所述第二视频帧中的像素点的第二位置信息,获取所述第一视频帧对应的正向位移矩阵,并获取所述第二视频帧对应的反向位移矩阵;
其中,其中,矩阵确定单元4002包括:第一追踪子单元4021和第二追踪子单元4022;
第一追踪子单元4021,用于基于所述第一视频帧中的像素点的第一位置信息和所述光流追踪规则,将所述第一视频帧中的像素点正向映射到所述第二视频帧,并在所述第二视频帧中确定所映射得到的第一映射点的第二位置信息,并基于所述像素点的第一位置信息、所述第一映射点的第二位置信息确定所述第一视频帧对应的正向位移矩阵;
第二追踪子单元4022,基于所述第二视频帧中的像素点的第二位置信息和所述光流追踪规则,将所述第二视频帧中的第一映射点反向映射到所述第一视频帧,并在所述第一视频帧中确定所映射得到第二映射点的第三位置信息,并基于所述第一映射点的第二位置信息、所述第二映射点的第三位置信息确定所述第二视频帧对应的反向位移矩阵。
其中,第一追踪子单元4021和第二追踪子单元4022的具体实现方式可以参见上述图8所对应实施例中对云端前向后向光流法的描述,这里将不再继续进行赘述。
像素点筛选单元4003,用于基于所述第一视频帧中的像素点的第一位置信息、所述正向位移矩阵、所述反向位移矩阵,将所述像素点中满足目标筛选条件的像素点确定为有效像素点;
其中,所述像素点筛选单元4003包括:第一位置确定子单元4031、第二位置确定子单元4032、第三位置确定子单元4033、误差确定子单元4034和有效筛选子单元4035;
第一位置确定子单元4031,用于从所述第一视频帧中的像素点中获取第一像素点,并在所述第一视频帧中确定所述第一像素点的第一位置信息,并从所述正向位移矩阵中确定所述第一像素点对应的第一横向位移和第一纵向位移;
第二位置确定子单元4032,用于基于所述第一像素点的第一位置信息、所述第一像素点对应的第一横向位移和第一纵向位移,将所述第一像素点正向映射到所述第二视频帧,并在所述第二视频帧中确定所映射得到的第二像素点的第二位置信息;
第三位置确定子单元4033,用于从所述反向位移矩阵中确定所述第二像素点对应的第二横向位移和第二纵向位移,并基于所述第二像素点的第二位置信息、所述第二像素点对应的第二横向位移和第二纵向位移,将所述第二像素点反向映射到所述第一视频帧,并在所述第一视频帧中确定所映射得到的第三像素点的第三位置信息;
误差确定子单元4034,用于基于所述第一像素点的第一位置信息、所述第三像素点的第三位置信息,确定所述第一像素点与所述第三像素点之间的误差距离,并根据所述第一像素点的第一位置信息、所述第二像素点的第二位置信息,确定包含第一像素点的图像块与包含所述第二像素点的图像块之间的相关系数;
有效筛选子单元4035,用于将所述像素点中误差距离小于误差距离阈值、且所述相关系数大于等于相关系数阈值的像素点确定为有效像素点。
其中,第一位置确定子单元4031、第二位置确定子单元4032、第三位置确定子单元4033、误差确定子单元4034和有效筛选子单元4035的具体实现方式可以参见上述图8所对应实施例中对确定有效像素点的具体过程的描述,这里将不再继续进行赘述。
矩阵修正单元4004,用于基于所述有效像素点对所述第一视频帧对应的初始状态矩阵和所述正向位移矩阵进行修正,得到所述第一视频帧对应的目标状态矩阵和目标位移矩阵;
其中,所述矩阵修正单元4004包括:初始获取子单元4041、数值切换子单元4042、位移设置子单元4043;
初始获取子单元4041,获取第一视频帧对应的初始状态矩阵;所述初始状态矩阵中的每个矩阵元素的状态值均为第一数值,一个矩阵元素对应所述像素点中的一个像素点;
数值切换子单元4042,用于在所述初始状态矩阵中将与所述有效像素点对应的矩阵元素的状态值由第一数值切换为第二数值,并将包含第二数值的初始状态矩阵确定为所述第一视频帧对应的目标状态矩阵;
位移设置子单元4043,用于在所述正向位移矩阵中将所述剩余像素点对应的矩阵元素的位移设置为所述第一数值,并将包含所述第一数值的正向位移矩阵确定为目标位移矩阵;所述剩余像素点为所述像素点中除所述有效像素点之 外的像素点。
其中,所述位移设置子单元4043,具体用于若所述正向位移矩阵包含初始横向位移矩阵和初始纵向位移矩阵,则在所述初始横向位移矩阵中将所述剩余像素点对应的矩阵元素的第一横向位移设置为所述第一数值,并将包含所述第一数值的初始横向位移确定为所述第一视频帧对应的横向位移矩阵;
所述位移设置子单元4043,还具体用于在所述初始纵向位移矩阵将所述剩余像素点对应的矩阵元素的第一纵向位移设置为所述第一数值,并将包含所述第一数值的初始纵向位移确定为所述第一视频帧对应的纵向位移矩阵;
所述位移设置子单元4043,还具体用于将所述第一视频帧对应的横向位移矩阵和所述第一视频帧对应的纵向位移矩阵确定为目标位移矩阵。
初始获取子单元4041、数值切换子单元4042、位移设置子单元4043的具体实现方式可以参见上述图8所对应实施例中对修正初始状态矩阵和正向位移矩阵的描述,这里将不再继续进行赘述。
第二确定单元4005,用于基于所述目标状态矩阵和所述目标位移矩阵,确定所述第一视频帧对应的平均位移矩阵。
其中,所述第二确定单元4005包括:第一积分子单元4051、第二积分子单元4052、第三积分子单元4053、差分运算子单元4054;
第一积分子单元4051,用于在所述第一视频帧中对所述目标状态矩阵进行位移积分运算,得到所述第一视频帧中的像素点对应的状态积分矩阵;
第二积分子单元4052,用于在所述第一视频帧中对所述目标状态矩阵中的横向位移矩阵进行位移积分运算,得到所述第一视频帧中的像素点对应的横向位移积分矩阵;
第三积分子单元4053,用于在所述第一视频帧中对所述目标状态矩阵中的纵向位移矩阵进行位移积分运算,得到所述第一视频帧中的像素点对应的纵向位移积分矩阵;
差分运算子单元4054,用于从所述第一视频帧中确定位移差分运算对应的差分区域,基于所述差分区域的尺寸信息、状态积分矩阵、横向位移积分矩阵和纵向位移积分矩阵,确定所述第一视频帧对应的平均位移矩阵。
其中,所述差分运算子单元4054包括:第一差分子单元4055、第二差分子单元4056、第三差分子单元4057、平均确定子单元4058;
第一差分子单元4055,用于基于所述差分区域对应的长度信息和宽度信息,对所述状态积分矩阵进行位移差分运算,得到所述第一图像帧对应的状态差分矩阵;
第二差分子单元4056,用于基于所述差分区域对应的长度信息和宽度信息,分别对所述横向位移积分矩阵和纵向位移积分矩阵进行位移差分运算,得到所述第一图像帧对应的横向位移差分矩阵和纵向位移差分矩阵;
第三差分子单元4057,用于将所述横向位移差分矩阵与所述状态差分矩阵之间的比值确定为横向平均位移矩阵,并将所述纵向位移差分矩阵与所述状态差分矩阵之间的比值确定为纵向平均位移矩阵;
平均确定子单元4058,用于将所述纵向位移差分矩阵和所述纵向平均位移矩阵确定为所述第一视频帧对应的平均位移矩阵。
其中,第一积分子单元4051、第二积分子单元4052、第三积分子单元4053、差分运算子单元4054的具体实现方式可以参见上述图8所对应实施例中对云端位移积分方法和云端位移差分方法的描述,这里将不再继续进行赘述。
其中,第一确定单元4001、矩阵确定单元4002、像素点筛选单元4003、矩阵修正单元4004、第二确定单元4005的具体实现方式可以参见上述图8所对应实施例中对步骤S202的描述,这里将不再继续进行赘述。
位置跟踪模块510,用于基于所述平均位移矩阵对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息;
轨迹生成模块610,用于基于所述第一视频帧中的像素点的位置信息、所述跟踪得到的像素点在所述第二视频帧中的位置信息,生成所述目标视频相关联的轨迹信息;所述轨迹信息中包含用于对目标视频中的目标像素点所关联的多媒体信息进行跟踪显示的目标轨迹信息。
其中,第一获取模块310、矩阵获取模块410、位置跟踪模块510、轨迹生成模块610的具体实现方式可以参见上述图8所对应实施例中对步骤S201-步骤S204的描述,这里将不再继续进行赘述。
进一步地,请参见图16,是本申请实施例提供的又一种计算机设备的结构示意图。如图16所示,上述计算机设备3000可以应用于上述图1对应实施例中的业务服务器2000。上述计算机设备3000可以包括:处理器3001,网络接 口3004和存储器3005,此外,上述视频数据处理装置3000还可以包括:用户接口3003,和至少一个通信总线3002。其中,通信总线3002用于实现这些组件之间的连接通信。其中,用户接口3003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口3003还可以包括标准的有线接口、无线接口。网络接口3004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器3004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器3005可选的还可以是至少一个位于远离前述处理器3001的存储装置。如图16所示,作为一种计算机存储介质的存储器3005中可以包括操作***、网络通信模块、用户接口模块以及设备控制应用程序。
在图16所示的计算机设备3000中,网络接口3004可提供网络通讯功能;而用户接口3003主要用于为用户提供输入的接口;而处理器3001可以用于调用存储器3005中存储的设备控制应用程序,以实现:
从目标视频中获取相邻的第一视频帧和第二视频帧;
基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵;
基于所述平均位移矩阵对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息;
基于所述第一视频帧中的像素点的位置信息、所述跟踪得到的像素点在所述第二视频帧中的位置信息,生成所述目标视频相关联的轨迹信息;所述轨迹信息中包含用于对目标视频中的目标像素点所关联的多媒体信息进行跟踪显示的目标轨迹信息。
应当理解,本申请实施例中所描述的计算机设备3000可执行前文图8所对应实施例中对上述视频数据处理方法的描述,也可执行前文图15所对应实施例中对上述视频数据处理装置3的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机存储介质,且上述计算机存储介质中存储有前文提及的视频数据处理装置3所执行的计算机程序,且上述计算机程序包括程序指令,当上述处理器执行上述程序指令时,能够执行前文图8所对应实施例中对上述视频数据处理方法的描述,因此,这 里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (15)

  1. 一种视频数据处理方法,所述方法应用于计算机设备,其特征在于,包括:
    响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点;
    基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求;
    基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的;
    当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,显示所述多媒体信息。
  2. 一种视频数据处理方法,所述方法应用于业务服务器,其特征在于,包括:
    响应于对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息,其中,所述关键视频帧是所述目标视频中的视频帧,所述目标像素点是所述关键视频帧中的像素点,所述轨迹信息是由所述目标视频的每个视频帧中的像素点的位置信息所确定的;
    从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息,其中,所述目标轨迹信息包含目标位置信息,所述目标位置信息用于触发在所述关键视频帧的下一视频帧中,显示与所述目标像素点相关联的多媒体信息。
  3. 一种视频数据处理方法,其特征在于,包括:
    从目标视频中获取相邻的第一视频帧和第二视频帧;
    基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵;
    基于所述平均位移矩阵,对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息;
    基于所述第一视频帧中的像素点的位置信息、所述跟踪得到的像素点在所述第二视频帧中的位置信息,生成与所述目标视频相关联的轨迹信息,其中,所述轨迹信息中包含用于对目标视频中的目标像素点所关联的多媒体信息进行跟踪显示的目标轨迹信息。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵,包括:
    获取所述目标视频对应的光流追踪规则,并将所述第一视频帧中的像素点的位置信息确定为第一位置信息,并将所述第二视频帧中的像素点的位置信息确定为第二位置信息;
    基于所述光流追踪规则、所述第一视频帧中的像素点的第一位置信息、所述第二视频帧中的像素点的第二位置信息,获取所述第一视频帧对应的正向位移矩阵,并获取所述第二视频帧对应的反向位移矩阵;
    基于所述第一视频帧中的像素点的第一位置信息、所述正向位移矩阵、所述反向位移矩阵,将所述像素点中满足目标筛选条件的像素点确定为有效像素点;
    基于所述有效像素点对所述第一视频帧对应的初始状态矩阵和所述正向位移矩阵进行修正,得到所述第一视频帧对应的目标状态矩阵和目标位移矩阵;
    基于所述目标状态矩阵和所述目标位移矩阵,确定所述第一视频帧对应的平均位移矩阵。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述光流追踪规则、所述第一视频帧中的像素点的第一位置信息、所述第二视频帧中的像素点的第二位置信息,获取所述第一视频帧对应的正向位移矩阵,并获取所述第二视频帧对应的反向位移矩阵,包括:
    基于所述第一视频帧中的像素点的第一位置信息和所述光流追踪规则,将所述第一视频帧中的像素点正向映射到所述第二视频帧,并在所述第二视频帧中确定所映射得到的第一映射点的第二位置信息,并基于所述像素点的第一位置信息、所述第一映射点的第二位置信息确定所述第一视频帧对应的正向位移矩阵;
    基于所述第二视频帧中的像素点的第二位置信息和所述光流追踪规则,将所述第二视频帧中的像素点反向映射到所述第一视频帧,并在所述第一视频帧中确定所映射得到第二映射点的第三位置信息,并基于所述第一映射点的第二位置信息、所述第二映射点的第三位置信息确定所述第二视频帧对应的反向位移矩阵。
  6. 根据权利要求4所述的方法,其特征在于,所述基于所述第一视频帧中的像素点的第一位置信息、所述正向位移矩阵、所述反向位移矩阵,将所述像素点中满足目标筛选条件的像素点确定为有效像素点,包括:
    从所述第一视频帧中的像素点中获取第一像素点,并在所述第一视频帧中确定所述第一像素点的第一位置信息,并从所述正向位移矩阵中确定所述第一像素点对应的第一横向位移和第一纵向位移;
    基于所述第一像素点的第一位置信息、所述第一像素点对应的第一横向位移和第一纵向位移,将所述第一像素点正向映射到所述第二视频帧,并在所述第二视频帧中确定所映射得到的第二像素点的第二位置信息;
    从所述反向位移矩阵中确定所述第二像素点对应的第二横向位移和第二纵向位移,并基于所述第二像素点的第二位置信息、所述第二像素点对应的第二横向位移和第二纵向位移,将所述第二像素点反向映射到所述第一视频帧,并在所述第一视频帧中确定所映射得到的第三像素点的第三位置信息;
    基于所述第一像素点的第一位置信息、所述第三像素点的第三位置信息,确定所述第一像素点与所述第三像素点之间的误差距离,并根据所述第一像素点的第一位置信息、所述第二像素点的第二位置信息,确定包含第一像素点的图像块与包含所述第二像素点的图像块之间的相关系数;
    将所述像素点中误差距离小于误差距离阈值、且所述相关系数大于等于相关系数阈值的像素点确定为有效像素点。
  7. 根据权利要求4所述的方法,其特征在于,所述基于所述有效像素点对所述第一视频帧对应的初始状态矩阵和所述正向位移矩阵进行修正,得到所述第一视频帧对应的目标状态矩阵和目标位移矩阵,包括:
    获取第一视频帧对应的初始状态矩阵;所述初始状态矩阵中的每个矩阵元素的状态值均为第一数值,一个矩阵元素对应所述像素点中的一个像素点;
    在所述初始状态矩阵中将与所述有效像素点对应的矩阵元素的状态值由第一数值切换为第二数值,并将包含第二数值的初始状态矩阵确定为所述第一视频帧对应的目标状态矩阵;
    在所述正向位移矩阵中将所述剩余像素点对应的矩阵元素的位移设置为所述第一数值,并将包含所述第一数值的正向位移矩阵确定为目标位移矩阵;所述剩余像素点为所述像素点中除所述有效像素点之外的像素点。
  8. 根据权利要求7所述的方法,其特征在于,所述在所述正向位移矩阵中将所述剩余像素点对应的矩阵元素的位移设置为所述第一数值,并将包含所述第一数值的正向位移矩阵确定为目标位移矩阵,包括:
    若所述正向位移矩阵包含初始横向位移矩阵和初始纵向位移矩阵,则在所述初始横向位移矩阵中将所述剩余像素点对应的矩阵元素的第一横向位移设置为所述第一数值,并将包含所述第一数值的初始横向位移确定为所述第一视频帧对应的横向位移矩阵;
    在所述初始纵向位移矩阵将所述剩余像素点对应的矩阵元素的第一纵向位移设置为所述第一数值,并将包含所述第一数值的初始纵向位移确定为所述第一视频帧对应的纵向位移矩阵;
    将所述第一视频帧对应的横向位移矩阵和所述第一视频帧对应的纵向位移矩阵确定为目标位移矩阵。
  9. 根据权利要求4所述的方法,其特征在于,所述基于所述目标状态矩阵和所述目标位移矩阵,确定所述第一视频帧对应的平均位移矩阵,包括:
    在所述第一视频帧中对所述目标状态矩阵进行位移积分运算,得到所述第一视频帧中的像素点对应的状态积分矩阵;
    在所述第一视频帧中对所述目标状态矩阵中的横向位移矩阵进行位移积分运算,得到所述第一视频帧中的像素点对应的横向位移积分矩阵;
    在所述第一视频帧中对所述目标状态矩阵中的纵向位移矩阵进行位移积分运算,得到所述第一视频帧中的像素点对应的纵向位移积分矩阵;
    从所述第一视频帧中确定位移差分运算对应的差分区域,基于所述差分区域的尺寸信息、状态积分矩阵、横向位移积分矩阵和纵向位移积分矩阵,确定所述第一视频帧对应的平均位移矩阵。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述差分区域的尺寸信息、状态积分矩阵、横向位移积分矩阵和纵向位移积分矩阵,确定所述第一视频帧对应的平均位移矩阵,包括:
    基于所述差分区域对应的长度信息和宽度信息,对所述状态积分矩阵进行位移差分运算,得到所述第一图像帧对应的状态差分矩阵;
    基于所述差分区域对应的长度信息和宽度信息,分别对所述横向位移积分矩阵和纵向位移积分矩阵进行位移差分运算,得到所述第一图像帧对应的横向位移差分矩阵和纵向位移差分矩阵;
    将所述横向位移差分矩阵与所述状态差分矩阵之间的比值确定为横向平均位移矩阵,并将所述纵向位移差分矩阵与所述状态差分矩阵之间的比值确定为纵向平均位移矩阵;
    将所述纵向位移差分矩阵和所述纵向平均位移矩阵确定为所述第一视频帧对应的平均位移矩阵。
  11. 一种视频数据处理装置,所述装置应用于计算机设备,其特征在于,包括:
    对象确定模块,用于响应于对目标视频的触发操作,从所述目标视频的关键视频帧中确定目标像素点,并获取与所述目标像素点相关联的多媒体信息,其中,所述关键视频帧是所述触发操作所在的视频帧,所述目标像素点是所述关键视频帧中与所述触发操作对应的像素点;
    请求确定模块,用于基于所述目标像素点在所述关键视频帧中的位置信息,确定所述目标像素点对应的轨迹获取请求;
    轨迹获取模块,用于基于所述轨迹获取请求,获取与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,其中,所述目标轨迹信息包含所述目标像素点在所述关键视频帧的下一视频帧中的位置信息,所述目标像素点在所述关键视频帧的下一视频帧中的位置信息是通过跟踪所述目标像素点得到的;
    文本显示模块,用于当播放所述关键视频帧的下一视频帧时,基于所述目标轨迹信息中的所述目标像素点在所述关键视频帧的下一视频帧中的位置信息显示所述多媒体信息。
  12. 一种视频数据处理装置,所述方法应用于业务服务器,其特征在于,包括:
    请求响应模块,用于响应于对关键视频帧中的目标像素点的轨迹获取请求,获取与目标视频相关联的轨迹信息,其中,所述关键视频帧是所述目标视频中的视频帧,所述目标像素点是所述关键视频帧中的像素点,所述轨迹信息是由所述目标视频的每个视频帧中的像素点的位置信息所确定的;
    轨迹筛选模块,用于从所述目标视频相关联的轨迹信息中,筛选与所述目标像素点在所述关键视频帧中的位置信息相关联的目标轨迹信息,并返回所述目标轨迹信息,其中,所述目标轨迹信息包含目标位置信息,所述目标位置信息用于触发在所述关键视频帧的下一视频帧中,显示与所述目标像素点相关联的多媒体信息。
  13. 一种视频数据处理装置,其特征在于,包括:
    第一获取模块,用于从目标视频中获取相邻的第一视频帧和第二视频帧;
    矩阵获取模块,用于基于所述目标视频对应的光流追踪规则、所述第一视频帧中的像素点、所述第二视频帧中的像素点,确定所述第一视频帧对应的平均位移矩阵;
    位置跟踪模块,用于基于所述平均位移矩阵对所述第一视频帧中的像素点的位置信息进行跟踪,并在所述第二视频帧中确定所跟踪得到的像素点的位置信息;
    轨迹生成模块,用于基于所述第一视频帧中的像素点的位置信息、所述跟 踪得到的像素点在所述第二视频帧中的位置信息,生成与所述目标视频相关联的轨迹信息,其中,所述轨迹信息中包含用于对目标视频中的目标像素点所关联的多媒体信息进行跟踪显示的目标轨迹信息。
  14. 一种计算机设备,其特征在于,包括:处理器、存储器、网络接口;
    所述处理器与存储器、网络接口相连,其中,网络接口用于提供数据通信功能,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以执行如权利要求1、2、3-10任一项所述的方法。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如权利要求1、2、3-10任一项所述的方法。
PCT/CN2020/084112 2019-04-30 2020-04-10 一种视频数据处理方法和相关装置 WO2020220968A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020217022717A KR102562208B1 (ko) 2019-04-30 2020-04-10 비디오 데이터 프로세싱 방법 및 관련 디바이스
SG11202105410RA SG11202105410RA (en) 2019-04-30 2020-04-10 Video data processing method and related device
EP20799151.4A EP3965431A4 (en) 2019-04-30 2020-04-10 VIDEO DATA PROCESSING METHOD AND RELATED DEVICE
JP2021531593A JP7258400B6 (ja) 2019-04-30 2020-04-10 ビデオデータ処理方法、ビデオデータ処理装置、コンピュータ機器、及びコンピュータプログラム
US17/334,678 US11900614B2 (en) 2019-04-30 2021-05-28 Video data processing method and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910358569.8 2019-04-30
CN201910358569.8A CN110062272B (zh) 2019-04-30 2019-04-30 一种视频数据处理方法和相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/334,678 Continuation US11900614B2 (en) 2019-04-30 2021-05-28 Video data processing method and related apparatus

Publications (1)

Publication Number Publication Date
WO2020220968A1 true WO2020220968A1 (zh) 2020-11-05

Family

ID=67321748

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084112 WO2020220968A1 (zh) 2019-04-30 2020-04-10 一种视频数据处理方法和相关装置

Country Status (7)

Country Link
US (1) US11900614B2 (zh)
EP (1) EP3965431A4 (zh)
JP (1) JP7258400B6 (zh)
KR (1) KR102562208B1 (zh)
CN (1) CN110062272B (zh)
SG (1) SG11202105410RA (zh)
WO (1) WO2020220968A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117812392A (zh) * 2024-01-09 2024-04-02 广州巨隆科技有限公司 可视化屏幕的分辨率自适应调节方法、***、介质及设备

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110062272B (zh) 2019-04-30 2021-09-28 腾讯科技(深圳)有限公司 一种视频数据处理方法和相关装置
CN111161309B (zh) * 2019-11-19 2023-09-12 北航航空航天产业研究院丹阳有限公司 一种车载视频动态目标的搜索与定位方法
CN111193938B (zh) * 2020-01-14 2021-07-13 腾讯科技(深圳)有限公司 视频数据处理方法、装置和计算机可读存储介质
CN112258551B (zh) * 2020-03-18 2023-09-05 北京京东振世信息技术有限公司 一种物品掉落检测方法、装置、设备及存储介质
CN111753679B (zh) * 2020-06-10 2023-11-24 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 微运动监测方法、装置、设备及计算机可读存储介质
CN111901662A (zh) * 2020-08-05 2020-11-06 腾讯科技(深圳)有限公司 视频的扩展信息处理方法、设备和存储介质
CN114449326A (zh) * 2020-11-06 2022-05-06 上海哔哩哔哩科技有限公司 视频标注方法、客户端、服务器及***
CN114584824A (zh) * 2020-12-01 2022-06-03 阿里巴巴集团控股有限公司 数据处理方法、***、电子设备、服务端及客户端设备
CN112884830B (zh) * 2021-01-21 2024-03-29 浙江大华技术股份有限公司 一种目标边框确定方法及装置
CN113034458B (zh) * 2021-03-18 2023-06-23 广州市索图智能电子有限公司 室内人员轨迹分析方法、装置及存储介质
US12020279B2 (en) * 2021-05-03 2024-06-25 Refercloud Llc System and methods to predict winning TV ads, online videos, and other audiovisual content before production
CN114281447B (zh) * 2021-12-02 2024-03-19 武汉华工激光工程有限责任公司 一种载板激光加工软件界面处理方法、***及存储介质
CN114827754B (zh) * 2022-02-23 2023-09-12 阿里巴巴(中国)有限公司 视频首帧时间检测方法及装置
CN117270982A (zh) * 2022-06-13 2023-12-22 中兴通讯股份有限公司 数据处理方法、控制装置、电子设备、计算机可读介质
CN115297355B (zh) * 2022-08-02 2024-01-23 北京奇艺世纪科技有限公司 弹幕显示方法、生成方法、装置、电子设备及存储介质
CN116152301B (zh) * 2023-04-24 2023-07-14 知行汽车科技(苏州)股份有限公司 一种目标的速度估计方法、装置、设备及介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930779A (zh) * 2010-07-29 2010-12-29 华为终端有限公司 一种视频批注方法及视频播放器
US20140241573A1 (en) * 2013-02-27 2014-08-28 Blendagram, Inc. System for and method of tracking target area in a video clip
CN104881640A (zh) * 2015-05-15 2015-09-02 华为技术有限公司 一种获取向量的方法及装置
CN105872442A (zh) * 2016-03-30 2016-08-17 宁波三博电子科技有限公司 一种基于人脸识别的即时弹幕礼物赠送方法及***
CN108242062A (zh) * 2017-12-27 2018-07-03 北京纵目安驰智能科技有限公司 基于深度特征流的目标跟踪方法、***、终端及介质
CN109087335A (zh) * 2018-07-16 2018-12-25 腾讯科技(深圳)有限公司 一种人脸跟踪方法、装置和存储介质
CN109558505A (zh) * 2018-11-21 2019-04-02 百度在线网络技术(北京)有限公司 视觉搜索方法、装置、计算机设备及存储介质
CN110062272A (zh) * 2019-04-30 2019-07-26 腾讯科技(深圳)有限公司 一种视频数据处理方法和相关装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8363109B2 (en) * 2009-12-10 2013-01-29 Harris Corporation Video processing system providing enhanced tracking features for moving objects outside of a viewable window and related methods
JP5659307B2 (ja) * 2012-07-17 2015-01-28 パナソニックIpマネジメント株式会社 コメント情報生成装置およびコメント情報生成方法
US20190096439A1 (en) * 2016-05-23 2019-03-28 Robert Brouwer Video tagging and annotation
US20190253747A1 (en) 2016-07-22 2019-08-15 Vid Scale, Inc. Systems and methods for integrating and delivering objects of interest in video
US20180082428A1 (en) * 2016-09-16 2018-03-22 Qualcomm Incorporated Use of motion information in video data to track fast moving objects
WO2018105290A1 (ja) * 2016-12-07 2018-06-14 ソニーセミコンダクタソリューションズ株式会社 画像センサ
US10592786B2 (en) * 2017-08-14 2020-03-17 Huawei Technologies Co., Ltd. Generating labeled data for deep object tracking
CN109559330B (zh) * 2017-09-25 2021-09-10 北京金山云网络技术有限公司 运动目标的视觉跟踪方法、装置、电子设备及存储介质
CN108389217A (zh) * 2018-01-31 2018-08-10 华东理工大学 一种基于梯度域混合的视频合成方法
US20190392591A1 (en) * 2018-06-25 2019-12-26 Electronics And Telecommunications Research Institute Apparatus and method for detecting moving object using optical flow prediction
US10956747B2 (en) * 2018-12-31 2021-03-23 International Business Machines Corporation Creating sparsely labeled video annotations

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930779A (zh) * 2010-07-29 2010-12-29 华为终端有限公司 一种视频批注方法及视频播放器
US20140241573A1 (en) * 2013-02-27 2014-08-28 Blendagram, Inc. System for and method of tracking target area in a video clip
CN104881640A (zh) * 2015-05-15 2015-09-02 华为技术有限公司 一种获取向量的方法及装置
CN105872442A (zh) * 2016-03-30 2016-08-17 宁波三博电子科技有限公司 一种基于人脸识别的即时弹幕礼物赠送方法及***
CN108242062A (zh) * 2017-12-27 2018-07-03 北京纵目安驰智能科技有限公司 基于深度特征流的目标跟踪方法、***、终端及介质
CN109087335A (zh) * 2018-07-16 2018-12-25 腾讯科技(深圳)有限公司 一种人脸跟踪方法、装置和存储介质
CN109558505A (zh) * 2018-11-21 2019-04-02 百度在线网络技术(北京)有限公司 视觉搜索方法、装置、计算机设备及存储介质
CN110062272A (zh) * 2019-04-30 2019-07-26 腾讯科技(深圳)有限公司 一种视频数据处理方法和相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3965431A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117812392A (zh) * 2024-01-09 2024-04-02 广州巨隆科技有限公司 可视化屏幕的分辨率自适应调节方法、***、介质及设备
CN117812392B (zh) * 2024-01-09 2024-05-31 广州巨隆科技有限公司 可视化屏幕的分辨率自适应调节方法、***、介质及设备

Also Published As

Publication number Publication date
JP7258400B6 (ja) 2024-02-19
CN110062272A (zh) 2019-07-26
KR102562208B1 (ko) 2023-07-31
US11900614B2 (en) 2024-02-13
JP7258400B2 (ja) 2023-04-17
CN110062272B (zh) 2021-09-28
JP2022511828A (ja) 2022-02-01
SG11202105410RA (en) 2021-06-29
KR20210095953A (ko) 2021-08-03
EP3965431A1 (en) 2022-03-09
US20210287379A1 (en) 2021-09-16
EP3965431A4 (en) 2022-10-12

Similar Documents

Publication Publication Date Title
WO2020220968A1 (zh) 一种视频数据处理方法和相关装置
US10586350B2 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
JP6179889B2 (ja) コメント情報生成装置およびコメント表示装置
CN108604379A (zh) 用于确定图像中的区域的***及方法
EP3493105A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
US20190287306A1 (en) Multi-endpoint mixed-reality meetings
CN111464834B (zh) 一种视频帧处理方法、装置、计算设备及存储介质
JP2023509572A (ja) 車両を検出するための方法、装置、電子機器、記憶媒体およびコンピュータプログラム
BRPI1011189B1 (pt) Sistema baseado em computador para selecionar pontos de visualização ótimos e meio de armazenamento de sinal legível por máquina não transitória
JP7273129B2 (ja) 車線検出方法、装置、電子機器、記憶媒体及び車両
US11561675B2 (en) Method and apparatus for visualization of public welfare activities
EP3493104A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
US11921983B2 (en) Method and apparatus for visualization of public welfare activities
CN112752158A (zh) 一种视频展示的方法、装置、电子设备及存储介质
CN112702643B (zh) 弹幕信息显示方法、装置、移动终端
CN117152660A (zh) 图像显示方法及其装置
JP2021089711A (ja) 動画ブレの検出方法及び装置
DE102023105068A1 (de) Bewegungsvektoroptimierung für mehrfach refraktive und reflektierende Schnittstellen
CN114565777A (zh) 数据处理方法和装置
CN114140488A (zh) 视频目标分割方法及装置、视频目标分割模型的训练方法
JP6892557B2 (ja) 学習装置、画像生成装置、学習方法、画像生成方法及びプログラム
CN116506680B (zh) 一种虚拟空间的评论数据处理方法、装置及电子设备
CN113949926B (zh) 一种视频插帧方法、存储介质及终端设备
CN115993892A (zh) 信息输入方法、装置及电子设备
TW202405754A (zh) 深度識別模型訓練方法、圖像深度識別方法及相關設備

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20799151

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021531593

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217022717

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020799151

Country of ref document: EP

Effective date: 20211130