WO2018202089A1 - 关键点检测方法、装置、存储介质及电子设备 - Google Patents

关键点检测方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2018202089A1
WO2018202089A1 PCT/CN2018/085491 CN2018085491W WO2018202089A1 WO 2018202089 A1 WO2018202089 A1 WO 2018202089A1 CN 2018085491 W CN2018085491 W CN 2018085491W WO 2018202089 A1 WO2018202089 A1 WO 2018202089A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
video
key point
data
pixels
Prior art date
Application number
PCT/CN2018/085491
Other languages
English (en)
French (fr)
Inventor
张展鹏
孙书洋
张伟
Original Assignee
商汤集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 商汤集团有限公司 filed Critical 商汤集团有限公司
Publication of WO2018202089A1 publication Critical patent/WO2018202089A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the embodiments of the present invention relate to computer vision technologies, and in particular, to a key point detection method, apparatus, storage medium, and electronic device.
  • Key point detection of objects is an important technology used in applications involving video content parsing and retrieval, and is widely used in the fields of robotics, game entertainment, content analysis and recommendation of video websites.
  • research on key point detection of objects is divided into image-based key point detection and video-based key point detection.
  • the embodiment of the present application provides a key point detection technical solution.
  • a method for detecting a key point includes: acquiring video optical stream data according to a first video frame and a second video frame included in a video sequence, where the video optical stream data is used to indicate Determining displacement data of at least one pixel between the second video frame and the first video frame, the second video frame including at least one video frame in the video sequence that is consecutive in time and located before the first video frame; Acquiring the second key point data of the target object and the video optical stream data in the obtained second video frame, and acquiring first key point data of the target object in the first video frame.
  • the acquiring the video optical stream data according to the first video frame and the second video frame included in the video sequence including: by using a deep neural network for generating video optical stream data, according to the first video frame and the The second video frame acquires video optical stream data.
  • the acquiring, according to the obtained second key point data of the target object in the second video frame and the video optical flow data, the first key point data of the target object in the first video frame includes: selecting, according to the obtained second key point data of the target object in the second video frame, at least one second pixel centering on the second key point; indicating according to the video optical flow data Displacement data of the second pixel between the second video frame and the first video frame, acquiring data of a first pixel corresponding to each of the second pixels in the first video frame; Acquiring first key point data of the target object in the first video frame according to data of at least one of the first pixels.
  • the method before acquiring the video optical stream data according to the first video frame and the second video frame by using a deep neural network for generating video optical stream data, the method further includes: using multiple sample videos The frame sequence trains the deep neural network, and each sample video frame in the sequence of sample video frames contains annotation information of video optical flow data.
  • the video optical stream data is an optical flow graph, where a part of pixels in the optical flow graph correspond to pixels in the first video frame and pixels in a second video frame, and indicate the first Displacement information of pixels in a video frame relative to corresponding pixels in the second video frame.
  • the deep neural network comprises a full convolutional neural network.
  • the first key point and the second key point include: a key point of the human body, and/or a key point of the human face.
  • a key point detecting apparatus including: a first acquiring module, configured to acquire video optical stream data according to a first video frame and a second video frame included in a video sequence, The video optical stream data is used to indicate displacement data of at least one pixel between the second video frame and the first video frame, the second video frame including timing in the video sequence being consecutive and located in the first video frame a second acquisition module, configured to acquire, according to the obtained second key point data of the target object and the video optical stream data in the second video frame, the first video frame The first key point data of the target object.
  • the first acquiring module includes: a first acquiring unit, configured to acquire a video optical stream according to the first video frame and the second video frame by using a deep neural network for generating video optical flow data data.
  • the second obtaining module includes: a selecting unit, configured to select at least the second key point based on the second key point data of the target object in the obtained second video frame a second pixel; the second acquiring unit, configured to acquire at least one according to the displacement data of the second pixel between the second video frame and the first video frame indicated by the video optical stream data And acquiring, by the second pixel, the data of the first pixel corresponding to each of the first video frames, and the third acquiring unit, configured to acquire, according to the data of the at least one of the first pixels, the first video frame The first key point data of the target object.
  • the first obtaining module further includes: a training unit, configured to train the deep neural network by using a plurality of sample video frame sequences, where each sample video frame in the sample video frame sequence contains video optical flow data Labeling information.
  • a training unit configured to train the deep neural network by using a plurality of sample video frame sequences, where each sample video frame in the sample video frame sequence contains video optical flow data Labeling information.
  • the video optical stream data is an optical flow graph, where a part of pixels in the optical flow graph correspond to pixels in the first video frame and pixels in a second video frame, and indicate the first Displacement information of pixels in a video frame relative to corresponding pixels in the second video frame.
  • the deep neural network comprises a full convolutional neural network.
  • the first key point and the second key point include: a key point of the human body, and/or a key point of the human face.
  • an electronic device including: a processor and a memory; the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform the present application The operation corresponding to the key point detecting method provided by any of the above embodiments.
  • a computer readable storage medium storing operations corresponding to the key point detection method provided by any of the above embodiments of the present application. Executable instructions.
  • a computer program comprising computer instructions for implementing a key point as provided by any of the above embodiments of the present application when the computer instructions are run in a processor of the device The operation corresponding to the detection method.
  • the video optical stream data of consecutive video frames of the video sequence is obtained to determine displacement data of at least one pixel between consecutive video frames. Therefore, after obtaining the key point data of the target object in the previous video frame, according to the displacement information of the key point of the target object indicated by the video optical flow data, the key point data of the current video frame can be acquired, and the continuous video can be effectively utilized.
  • the timing information of the frame enables accurate positioning of key points in consecutive video frames.
  • the key point detection scheme of the embodiment does not need to separately perform feature learning on the preceding and succeeding video frames of the continuous video frame, but locates the latter video according to the correspondence relationship between the pixels of the continuous video frames and the key point data of the previous video frame.
  • the key points of the frame reduce the amount of calculation while reducing the calculation accuracy, and reduce the computational time.
  • FIG. 1 is a schematic diagram showing a key point detection scheme according to an embodiment of the present application.
  • FIG. 2 is a flow chart showing a method of detecting a key point according to another embodiment of the present application.
  • FIG. 3 is a flow chart showing a method of detecting a key point according to still another embodiment of the present application.
  • FIG. 4 is a logic block diagram showing a key point detecting device according to an embodiment of the present application.
  • FIG. 5 is a logic block diagram showing a key point detecting apparatus according to another embodiment of the present application.
  • FIG. 6 is a schematic structural view showing an electronic device according to an application embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a schematic diagram showing a key point detection scheme in accordance with an embodiment of the present application.
  • the basic idea of the key point detection scheme in the embodiment of the present application is to obtain video optical stream data of a continuous video frame by performing deep optical flow learning on a previous video frame and a current video frame in consecutive video frames. And, by fusing the acquired video optical stream data with data of a key point of the previous video frame, according to the correspondence relationship and displacement data of at least one pixel (for example, each pixel) between consecutive video frames indicated by the video optical stream data. To determine the data of the key points of the current video frame, thereby achieving precise positioning of key points in successive video frames.
  • FIG. 2 is a flow chart showing a method of detecting a key point according to an embodiment of the present application.
  • step S210 video optical stream data is acquired according to a first video frame and a second video frame included in the video sequence, where the video optical stream data is used to indicate at least one pixel between the second video frame and the first video frame.
  • Displacement data such as displacement data of each pixel between the second video frame and the first video frame.
  • the second video frame includes at least one video frame in the video sequence that is consecutive in time and located before the first video frame.
  • the video sequence may include, but is not limited to, live video, recorded video, human-computer interactive video, game video, surveillance video, and the like.
  • the first video frame and the second video frame are consecutive video frame images in the same video content, and the first video frame and the second video frame each include a plurality of pixels and include key points of one or more target objects.
  • the current frame video image being detected may be used as the first video frame, and one frame of the video image before the current video frame may be used as the second video frame.
  • the second video frame may also be the previous multi-frame video image.
  • the first video frame and the second video frame that are consecutive in time series include a target object, a pixel corresponding to each key point of the target object in the first video frame, and a target object in the second video frame.
  • the pixels corresponding to each key point have a displacement relationship corresponding to each other. That is, in the process of switching the second video frame to the first video frame, the pixels corresponding to the key points of the target object in the second video frame are shifted, and are switched to corresponding to the key points of the target object in the first video frame. Pixels to form a continuous video frame.
  • the video optical stream data is operative to indicate displacement data of at least one (eg, each) pixel between the second video frame and the first video frame, and includes at least a key point of the target object in the second video frame and the first video frame The displacement data of the corresponding pixel.
  • the optical optical flow data may be acquired based on the first video frame and the second video frame by using a conventional optical flow calculation method or a method such as a deep neural network.
  • the method of using the neural network to obtain the video optical flow data can avoid the problem that the calculation of the optical flow calculation method of the transmission takes a long time.
  • step S210 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first acquisition module 402 being executed by the processor.
  • step S220 the first key point data of the target object in the first video frame is acquired according to the second key point data of the target object and the video optical stream data in the obtained second video frame.
  • the second key point of the target object may also be detected from the second video frame.
  • the second video frame is detected by using a key point detection method to obtain second key point data of the target object in the second video frame.
  • the second key point data After acquiring the second key point data from the second video frame, according to the displacement data of at least one pixel between the second video frame and the first video frame indicated by the video optical stream data, that is, using the second video frame Determining a displacement relationship between at least one pixel of the target object and at least one pixel of the target object in the first video frame to determine a key point corresponding to the second key point of the target object in the second video frame in the first video frame And acquiring first key point data of the target object in the first video frame.
  • step S220 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second acquisition module 404 being executed by the processor.
  • the above steps S210 and S220 may be performed cyclically to perform key point positioning on consecutive video frames.
  • the first key point data of the target object in the first video frame acquired in the previous detection process may be directly obtained as the second video frame in the current detection process.
  • the second key point data of the target object does not need to perform target object key point detection on the second video frame in each detection process, thereby reducing the amount of calculation.
  • the key point detecting method of the present embodiment described above can be used to detect human key points in consecutive video frames.
  • the key point detecting method of the embodiment effectively utilizes the timing information of the continuous video frame, and performs key point positioning by the displacement data of each pixel between consecutive video frames, thereby ensuring the accuracy of the positioning of the key points of the video human body; Perform feature learning on the video frames before and after the continuous video frame, and locate the key points of the next video frame according to the correspondence between the pixels of the consecutive video frames and the key points of the previous video frame, so as to ensure accurate positioning of the key points of the human body. It also reduces the amount of computation and reduces the computational time.
  • the video optical stream data of the continuous video frame of the video sequence is obtained to determine displacement data of at least one pixel between consecutive video frames, thereby according to the target object between consecutive video frames.
  • the displacement information of the key point can obtain the key point data of the target object in the current video frame after obtaining the key point data of the target object in the previous video frame, thereby realizing the accurate positioning of the key points of the target object in the continuous video frame.
  • the key point detecting method of each embodiment of the present application may be performed by a video playing program, a live video program, or the like, but those skilled in the art should understand that in practical applications, any device having a corresponding data collecting and processing function is executed.
  • the key point detecting method of the present application can be performed with reference to the present embodiment.
  • FIG. 3 is a flow chart showing a method of detecting a key point according to another embodiment of the present application.
  • step S310 video optical stream data is acquired according to a first video frame and a second video frame by a deep neural network for generating video optical stream data.
  • the continuous second video frame and the first video frame are processed by the deep neural network for generating the video optical stream data, thereby acquiring the video.
  • Optical flow data to indicate displacement data for each pixel between the second video frame and the first video frame.
  • the video optical flow data between the second video frame and the first video frame output by the deep neural network can be obtained.
  • the second video frame and the first video frame of the input deep neural network may be video images, and the video optical stream data may also be an optical flow graph to reduce computational overhead of the video optical flow data.
  • the optical flow map can be the same size as the video image in a continuous video frame.
  • a portion of pixels in the optical flow map correspond to pixels in the first video frame and pixels in the second video frame, and indicate displacement information of pixels of the second video frame relative to pixels of the second video frame, eg, second
  • the pixel (x, y) in the video frame moves to the displacement information of the pixel (x', y') in the first video frame.
  • the partial pixels correspond to pixels located in the non-edge region of the first video frame, and at least include pixels corresponding to respective key points of the target object in the first video frame.
  • a deep neural network for generating video optical stream data can be pre-trained.
  • the deep neural network is trained using a plurality of sample video frame sequences, wherein each sample video frame in the sample video frame contains annotation information for the video optical flow data.
  • the video optical flow detection data of the plurality of sample video frame sequences is obtained from the deep neural network to be trained, and the light is calculated according to the labeled information of the video optical flow detection data and the video optical flow data.
  • the prediction difference of the flow detection such as calculating the loss value by the loss function, or calculating the cosine distance or the Euclidean distance of the annotation information of the video optical flow detection data and the video optical flow data. Thereafter, the calculated prediction difference is reversely transmitted to the deep neural network, and the network parameters of the deep neural network are updated, thereby training to obtain the deep neural network.
  • the deep neural network obtained by the training may be, for example, a full convolutional neural network, or may be other deep neural networks.
  • the displacement data of at least one pixel point may be manually labeled, or the temporally adjacent ones may be calculated according to the conventional optical flow displacement method.
  • the displacement data of the pixels between the sample video frames, and the displacement data of the calculated pixels are used as the annotation information of the respective sample video frames.
  • the step S310 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first obtaining unit 4022 being executed by the processor.
  • step S320 a plurality of (ie, at least one) second pixels are selected centering on the second key point according to the second key point data of the target object in the obtained second video frame.
  • the pixel near the second key point is selected as the second pixel from the second video frame centered on the second key point.
  • the second key point in the second video frame may be one or more.
  • the second key point includes multiple, a plurality of second pixels of the plurality of second key point attachments are respectively selected.
  • the step S320 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a selection unit 4042 being executed by the processor.
  • step S330 according to the displacement data of the second pixel between the second video frame and the first video frame indicated by the video optical flow data, data of the first pixel corresponding to each of the second pixels in the first video frame is acquired. .
  • the step S330 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second obtaining unit 4044 operated by the processor.
  • step S340 the first key point data of the target object in the first video frame is acquired according to the acquired data of the first pixels.
  • the two key points correspond.
  • the first key point in the first video frame is located at a center of the plurality of first pixels
  • the second key point of the second video frame is located at a center of the plurality of second pixels, that is, the first key point and the second key point
  • the correspondence relationship is consistent with the displacement relationship represented by the displacement data of each pixel between the second video frame and the first video frame.
  • the step S340 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a third obtaining unit 4046 operated by the processor.
  • the continuous video frame can be processed by the pre-trained full convolutional neural network to generate video optical stream data of the continuous video frame, and the continuous video frame is accurately indicated in the form of an optical flow graph.
  • Displacement information of each pixel thus, according to the displacement information indicated by the video optical flow data, by acquiring pixels around the key points of the target object in the previous video frame, determining the pixels in the current video frame that match the displacement information, thereby determining the current
  • the key points of the target object in the video frame enable accurate detection of key points in consecutive video frames.
  • the first key point and the second key point may include a key point of the human body, and/or a key point of the human face, that is, the key point detection method of the embodiment may be used to perform key detection of the video human body or Video face key point detection, or simultaneous detection of video body key points and face key points.
  • a webcast scenario is taken as an example for description.
  • the live video of the live broadcast containing the anchor is obtained through the camera, and the key point of the human body (such as the head and the wrist) of the anchor in the first frame of the video image can be detected by using the human key point algorithm. Key points such as shoulders).
  • the first frame video image and the second frame video image are input to the trained deep neural network for detecting the video optical flow data to obtain optical flow data between the two frames of video images to indicate two frames of video images. Displacement information between individual pixels.
  • the human key points in the second frame video image can be determined accordingly.
  • the first frame video image and the second frame video image respectively correspond to the second video frame and the first video frame described above.
  • the second frame video image and the third frame video image data may further be combined with the output optical stream data and the previously obtained human key points in the second video frame.
  • the human key points in the third frame video image can be determined accordingly.
  • the key points of the human body in each live video image can be accurately detected, which can be used to accurately track the key points of the human body in the live broadcast video.
  • the key point detection method of the embodiment is used to perform key detection of a video human body, and the optical flow information between consecutive video frames is used to determine displacement information between pixels in consecutive video frames, and then according to the human body of the previous video frame.
  • the key points locate the key points of the human body of the current video frame, which can effectively ensure the accuracy of the positioning of the key points of the video human body; and, in the actual detection process, the light of the front and rear video frames or the continuous video frames can be simultaneously processed by the deep neural network.
  • Streaming information, and positioning the key points of the human body in the current video frame according to the key points of the human body determined in the previous video frame reducing the calculation amount while ensuring the positioning accuracy, and effectively reducing the computational cost of the key point detection of the video human body Time.
  • FIG. 4 is a logic block diagram showing a keypoint detecting apparatus according to an embodiment of the present application.
  • the key point detecting apparatus of this embodiment includes a first obtaining module 402 and a second acquiring module 404.
  • the first obtaining module 402 is configured to acquire video optical stream data according to the first video frame and the second video frame included in the video sequence, where the video optical stream data is used to indicate between the second video frame and the first video frame. Displacement data for each pixel, the second video frame comprising at least one video frame of the video sequence that is consecutive in time and located before the first video frame.
  • the second obtaining module 404 is configured to acquire, according to the obtained second key point data of the target object and the video optical stream data in the second video frame, the first key of the target object in the first video frame Point data.
  • the key point detecting apparatus by acquiring video optical stream data of consecutive video frames of a video sequence, the displacement data of each pixel between consecutive video frames is determined, thereby obtaining the target object in the previous video frame.
  • the key point data according to the displacement information of the key points of the target object indicated by the video optical flow data, the key point data of the current video frame can be obtained, and the timing information of the continuous video frame can be effectively utilized to realize the continuous video frame. Accurate positioning of key points.
  • FIG. 5 is a logic block diagram showing a key point detecting device according to another embodiment of the present application.
  • the first obtaining module 402 includes a first obtaining unit 4022 for using a deep neural network for generating video optical stream data, according to The first video frame and the second video frame acquire video optical stream data.
  • the second obtaining module 404 includes a selecting unit 4042, a second obtaining unit 4044, and a third obtaining unit 4046.
  • the selecting unit 4042 is configured to select a plurality of (ie, at least one) second pixels centering on the second key point according to the obtained second key point data of the target object in the second video frame.
  • the second obtaining unit 4044 is configured to acquire, according to the displacement data of the second pixel between the second video frame and the first video frame indicated by the video optical flow data, Data of the first pixel corresponding to each of the first video frames.
  • the third obtaining unit 4046 is configured to acquire, according to data of the plurality of the first pixels, first key point data of the target object in the first video frame.
  • the first obtaining module 402 further includes a training unit 4024, configured to train the deep neural network by using a plurality of sample video frame sequences, where each sample video frame in the sample video frame sequence contains video optical flow data. Label information.
  • the video optical stream data is an optical flow graph, where a part of pixels in the optical flow graph correspond to pixels in the first video frame and pixels in a second video frame, and indicate the first Displacement information of pixels in a video frame relative to corresponding pixels in the second video frame.
  • the deep neural network may include, but is not limited to, a full convolutional neural network.
  • the first key point and the second key point may include: a key point of the human body, and/or a key point of the human face.
  • the key point detecting device of the embodiments of the present application can be used to implement the corresponding key point detecting method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, etc.
  • the electronic device includes: a processor and a memory; the memory is configured to store at least one executable An instruction that causes the processor to perform an operation corresponding to the keypoint detection method of any of the above-described embodiments of the present application.
  • FIG. 6 a schematic structural diagram of an application embodiment of an electronic device 600 suitable for implementing a terminal device or a server of an embodiment of the present application is shown.
  • electronic device 600 includes one or more processors, communication elements, etc., such as one or more central processing units (CPUs) 601, and/or one or more An image processor (GPU) 613 or the like, the processor may execute various kinds according to executable instructions stored in a read only memory (ROM) 602 or executable instructions loaded from the storage portion 608 into the random access memory (RAM) 603. Proper action and handling.
  • the communication component includes a communication component 612 and a communication interface 609.
  • the communication component 612 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, the communication interface 609 includes a communication interface of a network interface card such as a LAN card, a modem, etc., and the communication interface 609 is via an Internet interface such as The network performs communication processing.
  • a network card which can include, but is not limited to, an IB (Infiniband) network card
  • the communication interface 609 includes a communication interface of a network interface card such as a LAN card, a modem, etc.
  • the communication interface 609 is via an Internet interface such as The network performs communication processing.
  • the processor can communicate with the read only memory 602 and/or the random access memory 603 to execute executable instructions, connect to the communication component 612 via the bus 604, and communicate with other target devices via the communication component 612, thereby completing the embodiments of the present application.
  • Corresponding operations of any of the methods for example, acquiring video optical stream data according to a first video frame and a second video frame included in the video sequence, the video optical stream data being used to indicate the second video frame and the first Displacement data of at least one pixel between video frames, the second video frame including at least one video frame of the video sequence that is consecutive in time and located before the first video frame; according to the obtained second video frame And acquiring, by the second key point data of the target object and the video optical flow data, the first key point data of the target object in the first video frame.
  • RAM 603 various programs and data required for the operation of the device can be stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • ROM 602 is an optional module.
  • the RAM 603 stores executable instructions or writes executable instructions to the ROM 602 at runtime, the executable instructions causing the CPU 601 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the communication component 612 can be integrated or can be configured to have multiple sub-modules (eg, multiple IB network cards) and be on a bus link.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication interface 609 including a network interface card such as a LAN card, modem, or the like.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • FIG. 6 is only an optional implementation manner.
  • the number and types of components in FIG. 6 may be selected, deleted, added, or replaced according to actual needs;
  • Functional components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, communication components can be 612 separate settings, or integrated on the CPU or GPU. and many more.
  • the embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores executable instructions for performing operations corresponding to the key point detection method according to any of the above embodiments of the present application. .
  • the embodiment of the present application further provides a computer program, including a computer instruction, when the computer instruction is run in a processor of the device, implementing a key point detection method according to any one of the foregoing embodiments of the present application. operating.
  • an embodiment of the present application further provides a computer program product, comprising: a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method shown in the flowchart, the program code may include a corresponding Executing instructions corresponding to the method steps provided by any one of the foregoing embodiments of the present application, for example, acquiring video optical stream data according to a first video frame and a second video frame included in the video sequence, where the video optical stream data is used to indicate Displacement data of at least one pixel between the second video frame and the first video frame, the second video frame comprising at least one video frame of the video sequence that is consecutive in time and located before the first video frame; The second key point data of the target object and the video optical stream data in the second video frame acquire the first key point data of the target object in the first video frame.
  • the computer program can be downloaded and installed from the network via a communication component, and/or installed from removable media 611.
  • the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the method of the embodiment of the present application are executed.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order of the above optional description unless otherwise specified.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种关键点检测方法、装置、存储介质及电子设备。其中,所述关键点检测方法包括:根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,视频光流数据用于指示在第二视频帧和第一视频帧之间至少一个像素的位移数据,第二视频帧包括视频序列中时序连续且位于第一视频帧之前的至少一个视频帧;根据已获得的第二视频帧中目标对象的第二关键点数据和视频光流数据,获取第一视频帧中目标对象的第一关键点数据。采用本申请的实施例,可以有效地利用连续视频帧的时序信息,实现对连续视频帧中关键点的准确定位。

Description

关键点检测方法、装置、存储介质及电子设备
本申请要求在2017年05月05日提交中国专利局、申请号为CN201710311329.3、发明名称为“关键点检测方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机视觉技术,尤其涉及一种关键点检测方法、装置、存储介质及电子设备。
背景技术
物体(如行人、动物、车辆等)关键点检测是在涉及视频内容解析和检索的应用中使用的重要技术,被广泛地应用在机器人、游戏娱乐、视频网站的内容分析和推荐等领域。目前,针对物体关键点检测的研究分为基于图像的关键点检测和基于视频的关键点检测。
发明内容
本申请实施例提供一种关键点检测技术方案。
根据本申请实施例的一方面,提供一种关键点检测方法,包括:根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间至少一个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧;根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
可选地,所述根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,包括:通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取视频光流数据。
可选地,所述根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中目标对象的第一关键点数据,包括:根据已获得的所述第二视频帧中所述目标对象的第二关键点数据,以所述第二关键点为中心选取至少一个第二像素;根据所述视频光流数据所指示的所述第二视频帧和所述第一视频帧之间的所述第 二像素的位移数据,获取至少一个所述第二像素在所述第一视频帧中各自对应的第一像素的数据;根据至少一个所述第一像素的数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
可选地,在通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取视频光流数据之前,所述方法还包括:使用多个样本视频帧序列训练所述深度神经网络,所述样本视频帧序列中的每个样本视频帧含有视频光流数据的标注信息。
可选地,所述视频光流数据为光流图,所述光流图中的部分像素与所述第一视频帧中的像素和第二视频帧中的像素对应,并指示所述第一视频帧中的像素相对于所述第二视频帧中的对应像素的位移信息。
可选地,所述深度神经网络包括全卷积神经网络。
可选地,所述第一关键点和第二关键点包括:人体的关键点,和/或人脸的关键点。
根据本申请实施例的另一方面,还提供一种关键点检测装置,包括:第一获取模块,用于根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间至少一个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧;第二获取模块,用于根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
可选地,所述第一获取模块包括:第一获取单元,用于通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取视频光流数据。
可选地,所述第二获取模块包括:选取单元,用于根据已获得的所述第二视频帧中所述目标对象的第二关键点数据,以所述第二关键点为中心选取至少一个第二像素;第二获取单元,用于根据所述视频光流数据所指示的所述第二视频帧和所述第一视频帧之间的所述第二像素的位移数据,获取至少一个所述第二像素在所述第一视频帧中各自对应的第一像素的数据;第三获取单元,用于根据至少一个所述第一像素的数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
可选地,所述第一获取模块还包括:训练单元,用于使用多个样本视频帧序列训练所述深度神经网络,所述样本视频帧序列中的每个样本视频帧含有视频光流数据的标注信息。
可选地,所述视频光流数据为光流图,所述光流图中的部分像素与所述第一视频帧中的像素和第二视频帧中的像素对应,并指示所述第一视频帧中的像素相对于所述第二视频帧中的对应像素的位移信息。
可选地,所述深度神经网络包括全卷积神经网络。
可选地,所述第一关键点和第二关键点包括:人体的关键点,和/或人脸的关键点。
根据本申请实施例的又一方面,还提供一种电子设备,包括:处理器和存储器;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请上述任一实施例提供的关键点检测方法对应的操作。
根据本申请实施例的再一方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有用于执行如本申请上述述任一实施例提供的关键点检测方法对应的操作的可执行指令。
根据本申请实施例的再一方面,还提供了一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现如本申请上述述任一实施例提供的关键点检测方法对应的操作。
根据本申请实施例提供的关键点检测方法、装置、存储介质、程序及电子设备,通过获取视频序列的连续视频帧的视频光流数据,以确定连续视频帧之间的至少一个像素的位移数据,从而在获得前一视频帧中目标对象的关键点数据后,根据视频光流数据所指示的目标对象的关键点的位移信息,可以获取当前视频帧的关键点数据,能够有效地利用连续视频帧的时序信息,实现对连续视频帧中关键点的准确定位。此外,本实施例的关键点检测方案无需单独对连续视频帧的前后视频帧进行特征学习,而依据连续视频帧之间像素的对应关系以及前一视频帧的关键点数据,来定位后一视频帧的关键点,在保证定位准确度的同时减少了计算量,并降低了计算耗时。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是示出根据本申请一个实施例关键点检测方案的原理图;
图2是示出根据本申请另一个实施例关键点检测方法的流程图;
图3是示出根据本申请又一个实施例关键点检测方法的流程图;
图4是示出根据本申请一个实施例关键点检测装置的逻辑框图;
图5是示出根据本申请另一个实施例关键点检测装置的逻辑框图;
图6是示出根据本申请一个应用实施例电子设备的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外可选说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和装置可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机***、服务器等电子设备,其可与众多其它通用或专用计算***环境或配置一起操作。适于与终端设备、计算机***、服务器等电子设备一起使用的众所周知的终端设备、计算***、环境和/或配置的例子包括但不限于:个人计算机***、服务器计算机***、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的***、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机***﹑大型计算机***和包括上述任何***的分布式云计算技术环境,等等。
终端设备、计算机***、服务器等电子设备可以在由计算机***执行的计算机***可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机***/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算***存储介质上。
下面结合附图详细描述本申请实施例的示例性实施例。
图1是示出根据本申请一个实施例关键点检测方案的原理图。参照图1,本申请实施例的关键点检测方案的基本思路是:通过对连续视频帧中的前一视频帧和当前视频帧进行深度光流学习,来获取连续视频帧的视频光流数据;以及,通过将获取的视频光流数据与前一视频帧的关键点的数据进行融合,根据视频光流数据所指示的连续视频帧之间至少一个像素(例如各像素)的对应关系和位移数据,来确定当前视频帧的关键点的数据,从而实现对连续视频帧中的关键点的精确定位。
图2是示出根据本申请一个实施例关键点检测方法的流程图。
参照图2,在步骤S210,根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,视频光流数据用于指示在第二视频帧和第一视频帧之间至少一个像素的位移数据,例如第二视频帧和第一视频帧之间各个像素的位移数据。其中,第二视频帧包括视频序列中时序连续且位于第一视频帧之前的至少一个视频帧。
其中,视频序列可包括但不限于直播视频、录播视频、人机交互视频、游戏视频、监控视频等。第一视频帧和第二视频帧为同一视频内容中连续的视频帧图像,第一视频帧和第二视频帧均包括多个像素,并包括一个或多个目标物体的关键点。在对视频内容进行目标对象的关键点检测的实际应用中,可以将正在检测的当前帧视频图像作为第一视频帧,以当前视频帧之前的一帧视频图像作为第二视频帧。而且,在连续视频帧之间像素的位移较小的情况下,第二视频帧还可以为之前的多帧视频图像。
在视频内容的连续视频帧中,时序连续的第一视频帧和第二视频帧中均包括目标对象,第一视频帧中目标对象的各关键点对应的像素,与第二视频帧中目标对象的各关键点对应的像素存在相互对应的位移关系。也即,在第二视频帧切换至第一视频帧的过程中,第二视频帧中目标对象的各关键点对应的像素经过位移,切换至第一视频帧中目标对象的各关键点对应的像素,以形成连续的视频帧。视频光流数据可用于指示在第二视频帧和第一视频帧之间的至少一个(例如各个)像素的位移数据,并至少包括第二视频帧和第一视频帧中目标对象的各关键点对应的像素的位移数据。
在获取视频光流数据的操作中,可以采用传统的光流计算方法,或者采用深度神经网络等方法,基于第一视频帧和第二视频帧来获取视频光流数据。可选地,采用神经网络的方法来获取视频光流数据,可以避免传动的光流计算方法带来的计算耗时较长的问题。
在一个可选示例中,该步骤S210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取模块402执行。
在步骤S220,根据已获得的第二视频帧中目标对象的第二关键点数据和视频光流数据,获取第一视频帧中目标对象的第一关键点数据。
本实施例中,在对视频序列的连续视频帧进行目标对象的关键点检测的过程中,还可以从第二视频帧中检测目标对象的第二关键点。例如,采用关键点检测方式,对第二视频帧进行检测处理,以获取第二视频帧中目标对象的第二关键点数据。
在从第二视频帧中获取第二关键点数据之后,根据视频光流数据所指示的第二视频帧和第一视频帧之间至少一个像素的位移数据,也即,利用第二视频帧中目标对象的至少一个像素与第一视频帧中目标对象的至少一个像素之间对应的位移关系,来确定第二视频帧 中目标对象的第二关键点在第一视频帧中所对应的关键点,从而获取第一视频帧中目标对象的第一关键点数据。
在一个可选示例中,该步骤S220可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二获取模块404执行。
在本申请实施例的关键点检测方法的实际应用中,可以循环执行上述步骤S210和S220,对连续视频帧进行关键点定位。而且,在循环检测的过程中,在执行步骤S220时,可以直接获取前一次检测过程中获取的第一视频帧中目标对象的第一关键点数据,作为当前检测过程中的第二视频帧中目标对象的第二关键点数据,无需在每次的检测过程中都对第二视频帧进行目标对象关键点检测,从而降低计算量。
例如,可以采用上述本实施例的关键点检测方法来对连续视频帧中的人体关键点进行检测。本实施例的关键点检测方法有效利用连续视频帧的时序信息,通过连续视频帧之间的各个像素的位移数据来进行关键点定位,保证了视频人体关键点定位的准确度;而且,无需单独对连续视频帧的前后视频帧进行特征学习,并依据连续视频帧之间像素的对应关系以及前一视频帧的关键点,来定位后一视频帧的关键点,在保证人体关键点定位准确的同时还减少了计算量,以及降低了计算耗时。
根据本申请实施例的关键点检测方法,通过获取视频序列的连续视频帧的视频光流数据,以确定连续视频帧之间的至少一个像素的位移数据,从而根据连续视频帧之间的目标对象的关键点的位移信息,在获得前一视频帧中目标对象的关键点数据之后,可以获取当前视频帧中目标对象的关键点数据,实现了对连续视频帧中目标对象的关键点的准确定位。
本申请各实施例的关键点检测方法可以由视频播放程序、视频直播程序等来执行,但本领域技术人员应明了,在实际应用中,任意具有相应的数据采集和处理功能的设备执行,均可以参照本实施例来执行本申请的关键点检测方法。
图3是示出根据本申请另一份实施例关键点检测方法的流程图。
参照图3,在步骤S310,通过用于生成视频光流数据的深度神经网络,根据第一视频帧和第二视频帧获取视频光流数据。
本实施例中,在对视频序列的连续视频帧进行目标对象关键点检测时,通过用于生成视频光流数据的深度神经网络来处理连续的第二视频帧和第一视频帧,从而获取视频光流数据,以指示第二视频帧和第一视频帧之间的各个像素的位移数据。
在执行该步骤时,通过将第二视频帧和第一视频帧输入至训练好的深度神经网络,即可获得深度神经网络输出的第二视频帧和第一视频帧之间的视频光流数据。其中,输入深 度神经网络的第二视频帧和第一视频帧可以为视频图像,视频光流数据也可以为光流图,以降低视频光流数据的计算开销。该光流图可以与连续视频帧中的视频图像大小相同。光流图中的部分像素与第一视频帧中的像素和第二视频帧中的像素对应,并且指示第二视频帧的像素相对于第二视频帧的像素发生的位移信息,例如,第二视频帧中的像素(x,y)移动到第一视频帧中的像素(x’,y’)的位移信息。这里,部分像素对应第一视频帧中位于非边缘区域的像素,并且至少包括第一视频帧中目标对象的各关键点对应的像素。
在实际应用中,可预先训练用于生成视频光流数据的深度神经网络。例如,使用多个样本视频帧序列训练该深度神经网络,其中,样本视频帧中的每个样本视频帧均含有视频光流数据的标注信息。
在一种可选的训练方法中,从待训练的该深度神经网络获取多个样本视频帧序列的视频光流检测数据,再根据这些视频光流检测数据与视频光流数据的标注信息计算出光流检测的预测差异,如通过损失函数计算出损失值,或计算视频光流检测数据与视频光流数据的标注信息的余弦距离或欧式距离等。此后,将计算获得的预测差异反向传输给该深度神经网络,更新该深度神经网络的网络参数,从而训练获得该深度神经网络。
可选地,该训练得到的深度神经网络例如可以是全卷积神经网络,另外也可以是其他的深度神经网络。
此外,在对样本视频帧的视频光流数据进行标注时,可手工标注出至少一个像素点(例如各像素点)的位移数据,也可根据传统的光流位移方法来计算时序上相邻的样本视频帧之间像素的位移数据,将这些计算得到的像素的位移数据作为各个样本视频帧的标注信息。
在一个可选示例中,该步骤S310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取单元4022执行。
在步骤S320,根据已获得的第二视频帧中目标对象的第二关键点数据,以第二关键点为中心选取若干(即至少一个)第二像素。
在该步骤,获取到第二视频帧的第二关键点数据之后,以第二关键点为中心,从第二视频帧中选取第二关键点附近的像素作为第二像素。其中,第二视频帧中的第二关键点可能为一个或多个,在第二关键点包括多个时,分别选取多个第二关键点附件的若干第二像素。
在一个可选示例中,该步骤S320可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的选取单元4042执行。
在步骤S330,根据视频光流数据所指示的第二视频帧和第一视频帧之间的第二像素的 位移数据,获取若干第二像素在第一视频帧中各自对应的第一像素的数据。
在确定第二关键点对应的若干第二像素之后,根据视频光流数据所指示的第二视频帧与第一视频帧之间的第二像素的位移数据,以通过第二视频帧中的若干第二像素在第二视频帧和第一视频帧之间的位移信息,分别确定第二视频帧中的若干第二像素位移到第一视频帧中各自对应的第一像素,并获取若干第一像素的数据。
在一个可选示例中,该步骤S330可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二获取单元4044执行。
在步骤S340,根据获取到的若干第一像素的数据,获取第一视频帧中目标对象的第一关键点数据。
在确定第一视频帧中的若干第一像素之后,选取若干第一像素的中心作为第一视频帧中目标对象的第一关键点,该第一关键点与从第二视频帧中检测的第二关键点相对应。其中,第一视频帧中的第一关键点位于若干第一像素的中心,第二视频帧的第二关键点位于若干第二像素的中心,也即,第一关键点与第二关键点的对应关系,符合第二视频帧与第一视频帧之间各个像素的位移数据所表现的位移关系。
在一个可选示例中,该步骤S340可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三获取单元4046执行。
根据本实施例的关键点检测方法,可以通过预先训练全卷积神经网络处理连续视频帧,来生成连续视频帧的视频光流数据,并以光流图的形式来准确地指示连续视频帧之间各个像素的位移信息;从而根据视频光流数据所指示的位移信息,通过获取前一视频帧中目标对象的关键点周围的像素,确定当前视频帧中的符合位移信息的像素,进而确定当前视频帧中目标对象的关键点,实现了对连续视频帧中关键点的准确检测。
本实施例中,第一关键点和第二关键点可以包括人体的关键点,和/或人脸的关键点,也即,本实施例的关键点检测方法可用于进行视频人体关键点检测或者视频人脸关键点检测,或者同时进行视频人体关键点和人脸关键点检测。
例如,以网络直播场景为例进行说明。在视频主播进行网络直播的过程中,通过摄像头获取正在直播的包含主播的视频直播图像,可以通过采用人体关键点算法来检测第一帧视频图像中的主播的人体关键点(例如头部、手腕、肩部等关键点)。然后,把第一帧视频图像和第二帧视频图像输入到训练好的用于检测视频光流数据的深度神经网络,以得到两帧视频图像之间的光流数据,来指示两帧视频图像之间各个像素的位移信息。然后,根据得到的光流数据和第一帧视频图像中的人体关键点,可以相应地的确定第二帧视频图像 中的人体关键点。其中,第一帧视频图像和第二帧视频图像分别相当于上述的第二视频帧和第一视频帧。
另外,在上述网络直播场景中,还可以将第二帧视频图像和第三帧视频图像数据上述深度神经网络,并结合输出的光流数据和之前得到的第二视频帧中的人体关键点,可以相应地确定第三帧视频图像中的人体关键点。
以此类推,可以准确检测到每一帧直播视频图像中的人体关键点,可用于对直播视频中的直播进行人体关键点准确跟踪。
通过本实施例的关键点检测方法来进行视频人体关键点检测,利用连续视频帧之间的光流信息来确定连续视频帧之间各个像素之间的位移信息,进而依据前一视频帧的人体关键点定位当前视频帧的人体关键点,可以有效地保证视频人体关键点定位的准确度;并且,在实际检测过程中,可以通过深度神经网络来同时处理前后视频帧以或连续视频帧的光流信息,并依据前一视频帧中确定的人体关键点来定位当前视频帧中的人体关键点,在保证定位准确度的同时减少了计算量,有效地降低了视频人体关键点检测的计算耗时。
图4是示出根据本申请一个实施例关键点检测装置的逻辑框图。
参照图4,本实施例的关键点检测装置包括第一获取模块402和第二获取模块404。第一获取模块402用于根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间各个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧。第二获取模块404用于根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
根据本申请实施例提供的关键点检测装置,通过获取视频序列的连续视频帧的视频光流数据,来确定连续视频帧之间的各个像素的位移数据,从而在获得前一视频帧中目标对象的关键点数据后,根据视频光流数据所指示的目标对象的关键点的位移信息,可以获取当前视频帧的关键点数据,能够有效地利用连续视频帧的时序信息,实现对连续视频帧中关键点的准确定位。
图5是示出根据本申请另一个实施例关键点检测装置的逻辑框图。参照图5,与图4所示的实施例相比,该实施例中,所述第一获取模块402包括第一获取单元4022,用于通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取视频光流数据。
可选地,所述第二获取模块404包括选取单元4042、第二获取单元4044和第三获取 单元4046。选取单元4042用于根据已获得的所述第二视频帧中目标对象的第二关键点数据,以所述第二关键点为中心选取若干(即至少一个)第二像素。第二获取单元4044用于根据所述视频光流数据所指示的所述第二视频帧和所述第一视频帧之间的所述第二像素的位移数据,获取所述若干第二像素在所述第一视频帧中各自对应的第一像素的数据。第三获取单元4046用于根据若干所述第一像素的数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
可选地,第一获取模块402还包括训练单元4024,用于使用多个样本视频帧序列训练所述深度神经网络,所述样本视频帧序列中的每个样本视频帧含有视频光流数据的标注信息。
可选地,所述视频光流数据为光流图,所述光流图中的部分像素与所述第一视频帧中的像素和第二视频帧中的像素对应,并指示所述第一视频帧中的像素相对于所述第二视频帧中的对应像素的位移信息。
可选地,所述深度神经网络可以包括但不限于全卷积神经网络。
可选地,所述第一关键点和第二关键点可以包括:人体的关键点,和/或人脸的关键点。
本申请各实施例的关键点检测装置可用于实现前述方法实施例中相应的关键点检测方法,并具有相应的方法实施例的有益效果,在此不再赘述。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等,该电子设备包括:处理器和存储器;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请上述任一实施例所述的关键点检测方法对应的操作。
下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备600一个应用实施例的结构示意图。
如图6所示,电子设备600包括一个或多个处理器、通信元件等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)601,和/或一个或多个图像处理器(GPU)613等,处理器可以根据存储在只读存储器(ROM)602中的可执行指令或者从存储部分608加载到随机访问存储器(RAM)603中的可执行指令而执行各种适当的动作和处理。通信元件包括通信组件612和通信接口609。其中,通信组件612可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,通信接口609包括诸如LAN卡、调制解调器等的网络接口卡的通信接口,通信接口609经由诸如因特网的网络执行通信处理。
处理器可与只读存储器602和/或随机访问存储器603中通信以执行可执行指令,通过 总线604与通信组件612相连、并经通信组件612与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如,根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间至少一个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧;根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
此外,在RAM 603中,还可存储有装置操作所需的各种程序和数据。CPU601、ROM602以及RAM603通过总线604彼此相连。在有RAM603的情况下,ROM602为可选模块。RAM603存储可执行指令,或在运行时向ROM602中写入可执行指令,可执行指令使CPU601执行上述通信方法对应的操作。输入/输出(I/O)接口605也连接至总线604。通信组件612可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信接口609。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
需要说明的,如图6所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图6的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信组件可612分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请的保护范围。
另外,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有用于执行如本申请上述任一实施例所述的关键点检测方法对应的操作的可执行指令。
另外,本申请实施例还提供了一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现如本申请上述任一实施例所述的关键点检测方法对应的操作。
特别地,根据本申请实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请实施例还提供一种计算机程序产品,其包括有形地包含在机器可读介质 上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请上述任一实施例提供的方法步骤对应的指令,例如,根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间至少一个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧;根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。在这样的实施例中,该计算机程序可以通过通信元件从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请实施例的方法中限定的上述功能。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。
以上所述,仅为本申请实施例的可选实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种关键点检测方法,包括:
    根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间至少一个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧;
    根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
  2. 根据权利要求1所述的方法,其中,所述根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,包括:
    通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取所述视频光流数据。
  3. 根据权利要求1或2所述的方法,其中,所述根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中目标对象的第一关键点数据,包括:
    根据已获得的所述第二视频帧中所述目标对象的第二关键点数据,以所述第二关键点为中心选取至少一个第二像素;
    根据所述视频光流数据所指示的所述第二视频帧和所述第一视频帧之间的所述第二像素的位移数据,获取至少一个所述第二像素在所述第一视频帧中各自对应的第一像素的数据;
    根据至少一个所述第一像素的数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
  4. 根据权利要求2或3所述的方法,其中,在通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取视频光流数据之前,还包括:
    使用多个样本视频帧序列训练所述深度神经网络,所述样本视频帧序列中的每个样本视频帧含有视频光流数据的标注信息。
  5. 根据权利要求1~4中任一项所述的方法,其中,所述视频光流数据为光流图,所述光流图中的部分像素与所述第一视频帧中的像素和第二视频帧中的像素对应,并指示所述第一视频帧中的像素相对于所述第二视频帧中的对应像素的位移信息。
  6. 根据权利要求2~5中任一项所述的方法,其中,所述深度神经网络包括全卷积神经网络。
  7. 根据权利要求1~6中任一项所述的方法,其中,所述第一关键点和第二关键点包括:人体的关键点,和/或人脸的关键点。
  8. 一种关键点检测装置,包括:
    第一获取模块,用于根据视频序列包括的第一视频帧和第二视频帧获取视频光流数据,所述视频光流数据用于指示在所述第二视频帧和第一视频帧之间至少一个像素的位移数据,所述第二视频帧包括所述视频序列中时序连续且位于所述第一视频帧之前的至少一个视频帧;
    第二获取模块,用于根据已获得的所述第二视频帧中目标对象的第二关键点数据和所述视频光流数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
  9. 根据权利要求8所述的装置,其中,所述第一获取模块包括:
    第一获取单元,用于通过用于生成视频光流数据的深度神经网络,根据所述第一视频帧和所述第二视频帧获取所述视频光流数据。
  10. 根据权利要求8或9所述的装置,其中,所述第二获取模块包括:
    选取单元,用于根据已获得的所述第二视频帧中所述目标对象的第二关键点数据,以所述第二关键点为中心选取至少一个第二像素;
    第二获取单元,用于根据所述视频光流数据所指示的所述第二视频帧和所述第一视频帧之间的所述第二像素的位移数据,获取至少一个所述第二像素在所述第一视频帧中各自对应的第一像素的数据;
    第三获取单元,用于根据至少一个所述第一像素的数据,获取所述第一视频帧中所述目标对象的第一关键点数据。
  11. 根据权利要求9或10所述的装置,其中,所述第一获取模块还包括:
    训练单元,用于使用多个样本视频帧序列训练所述深度神经网络,所述样本视频帧序列中的每个样本视频帧含有视频光流数据的标注信息。
  12. 根据权利要求8~11中任一项所述的装置,其中,所述视频光流数据为光流图,所述光流图中的部分像素与所述第一视频帧中的像素和第二视频帧中的像素对应,并指示所述第一视频帧中的像素相对于所述第二视频帧中的对应像素的位移信息。
  13. 根据权利要求9~12中任一项所述的装置,其中,所述深度神经网络包括全卷积神经网络。
  14. 根据权利要求8~13中任一项所述的装置,其中,所述第一关键点和第二关键点包括:人体的关键点,和/或人脸的关键点。
  15. 一种电子设备,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1~7中任一所述的关键点检测方法对应的操作。
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有用于执行如权利要求1~7中任一项所述的关键点检测方法对应的操作的可执行指令。
  17. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现如权利要求1~7中任一项所述的关键点检测方法对应的操作。
PCT/CN2018/085491 2017-05-05 2018-05-03 关键点检测方法、装置、存储介质及电子设备 WO2018202089A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710311329.3 2017-05-05
CN201710311329.3A CN108229282A (zh) 2017-05-05 2017-05-05 关键点检测方法、装置、存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2018202089A1 true WO2018202089A1 (zh) 2018-11-08

Family

ID=62658076

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/085491 WO2018202089A1 (zh) 2017-05-05 2018-05-03 关键点检测方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN108229282A (zh)
WO (1) WO2018202089A1 (zh)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697416A (zh) * 2018-12-14 2019-04-30 腾讯科技(深圳)有限公司 一种视频数据处理方法和相关装置
CN109977775A (zh) * 2019-02-25 2019-07-05 腾讯科技(深圳)有限公司 关键点检测方法、装置、设备及可读存储介质
CN111027412A (zh) * 2019-11-20 2020-04-17 北京奇艺世纪科技有限公司 一种人体关键点识别方法、装置及电子设备
CN111178308A (zh) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 一种手势轨迹的识别方法和装置
CN111222509A (zh) * 2020-01-17 2020-06-02 北京字节跳动网络技术有限公司 目标检测方法、装置及电子设备
CN111523402A (zh) * 2020-04-01 2020-08-11 车智互联(北京)科技有限公司 一种视频处理方法、移动终端及可读存储介质
CN111860286A (zh) * 2020-07-14 2020-10-30 艾伯资讯(深圳)有限公司 基于混合策略的暴力行为检测方法及***、存储介质
CN111954055A (zh) * 2020-07-01 2020-11-17 北京达佳互联信息技术有限公司 视频特效的展示方法、装置、电子设备及存储介质
CN112418153A (zh) * 2020-12-04 2021-02-26 上海商汤科技开发有限公司 图像处理方法、装置、电子设备和计算机存储介质
CN112950672A (zh) * 2021-03-03 2021-06-11 百度在线网络技术(北京)有限公司 确定关键点的位置的方法、装置和电子设备
CN112989987A (zh) * 2021-03-09 2021-06-18 北京京东乾石科技有限公司 用于识别人群行为的方法、装置、设备以及存储介质
CN113379877A (zh) * 2021-06-08 2021-09-10 北京百度网讯科技有限公司 人脸视频生成方法、装置、电子设备及存储介质
CN114630012A (zh) * 2022-03-11 2022-06-14 北京奇艺世纪科技有限公司 一种虚拟试衣视频生成方法、装置、电子设备及介质

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881952B (zh) * 2018-07-02 2021-09-14 上海商汤智能科技有限公司 视频生成方法及装置、电子设备和存储介质
CN109389072B (zh) * 2018-09-29 2022-03-08 北京字节跳动网络技术有限公司 数据处理方法和装置
US20200135236A1 (en) * 2018-10-29 2020-04-30 Mediatek Inc. Human pose video editing on smartphones
US10991122B2 (en) * 2018-11-13 2021-04-27 Google Llc Processing images to localize novel objects
CN109583391B (zh) * 2018-12-04 2021-07-16 北京字节跳动网络技术有限公司 关键点检测方法、装置、设备及可读介质
CN109862380B (zh) * 2019-01-10 2022-06-03 北京达佳互联信息技术有限公司 视频数据处理方法、装置及服务器、电子设备和存储介质
CN109871760B (zh) * 2019-01-15 2021-03-26 北京奇艺世纪科技有限公司 一种人脸定位方法、装置、终端设备及存储介质
CN110058685B (zh) * 2019-03-20 2021-07-09 北京字节跳动网络技术有限公司 虚拟对象的显示方法、装置、电子设备和计算机可读存储介质
CN110059605A (zh) * 2019-04-10 2019-07-26 厦门美图之家科技有限公司 一种神经网络训练方法、计算设备及存储介质
CN110264455B (zh) * 2019-06-19 2021-07-23 北京市商汤科技开发有限公司 图像处理、神经网络训练方法及装置、存储介质
CN110264499A (zh) * 2019-06-26 2019-09-20 北京字节跳动网络技术有限公司 基于人体关键点的交互位置控制方法、装置及电子设备
CN110555414B (zh) * 2019-09-05 2022-09-30 北京市商汤科技开发有限公司 目标检测方法、装置、设备及存储介质
CN111027495A (zh) * 2019-12-12 2020-04-17 京东数字科技控股有限公司 用于检测人体关键点的方法和装置
CN113516017B (zh) * 2021-04-22 2023-07-11 平安科技(深圳)有限公司 服药过程的监督方法、装置、终端设备及存储介质
CN113361364B (zh) * 2021-05-31 2022-11-01 北京市商汤科技开发有限公司 目标行为检测方法、装置、设备及存储介质
CN115937958B (zh) * 2022-12-01 2023-12-15 北京惠朗时代科技有限公司 一种眨眼检测方法、装置、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2256690A1 (en) * 2009-05-29 2010-12-01 Honda Research Institute Europe GmbH Object motion detection system based on combining 3D warping techniques and a proper object motion detection
CN103759670A (zh) * 2014-01-06 2014-04-30 四川虹微技术有限公司 一种基于数字近景摄影的物体三维信息获取方法
CN105469056A (zh) * 2015-11-26 2016-04-06 小米科技有限责任公司 人脸图像处理方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9014421B2 (en) * 2011-09-28 2015-04-21 Qualcomm Incorporated Framework for reference-free drift-corrected planar tracking using Lucas-Kanade optical flow
CN102890781B (zh) * 2012-07-04 2016-01-13 北京航空航天大学 一种针对羽毛球比赛视频的精彩镜头识别方法
CN103593679A (zh) * 2012-08-16 2014-02-19 北京大学深圳研究生院 一种基于在线机器学习的视觉人手跟踪方法
CN104636751A (zh) * 2014-12-11 2015-05-20 广东工业大学 基于时间递归神经网络的人群异常检测和定位***及方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2256690A1 (en) * 2009-05-29 2010-12-01 Honda Research Institute Europe GmbH Object motion detection system based on combining 3D warping techniques and a proper object motion detection
CN103759670A (zh) * 2014-01-06 2014-04-30 四川虹微技术有限公司 一种基于数字近景摄影的物体三维信息获取方法
CN105469056A (zh) * 2015-11-26 2016-04-06 小米科技有限责任公司 人脸图像处理方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALEXEY, DOSOVITSKIY ET AL.: "FlowNet: Learning Optical Flow with Convolutional Networks", 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 18 February 2016 (2016-02-18), pages 2758 - 2766, XP032866621, ISSN: 2380-7504 *
R. PATRICK GOEBEL: "ROS By Example", 31 January 2016, pages: 149 - 156 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697416B (zh) * 2018-12-14 2022-11-18 腾讯科技(深圳)有限公司 一种视频数据处理方法和相关装置
CN109697416A (zh) * 2018-12-14 2019-04-30 腾讯科技(深圳)有限公司 一种视频数据处理方法和相关装置
CN109977775A (zh) * 2019-02-25 2019-07-05 腾讯科技(深圳)有限公司 关键点检测方法、装置、设备及可读存储介质
CN109977775B (zh) * 2019-02-25 2023-07-28 腾讯科技(深圳)有限公司 关键点检测方法、装置、设备及可读存储介质
CN111027412A (zh) * 2019-11-20 2020-04-17 北京奇艺世纪科技有限公司 一种人体关键点识别方法、装置及电子设备
CN111027412B (zh) * 2019-11-20 2024-03-08 北京奇艺世纪科技有限公司 一种人体关键点识别方法、装置及电子设备
CN111178308A (zh) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 一种手势轨迹的识别方法和装置
CN111222509A (zh) * 2020-01-17 2020-06-02 北京字节跳动网络技术有限公司 目标检测方法、装置及电子设备
CN111222509B (zh) * 2020-01-17 2023-08-18 北京字节跳动网络技术有限公司 目标检测方法、装置及电子设备
CN111523402A (zh) * 2020-04-01 2020-08-11 车智互联(北京)科技有限公司 一种视频处理方法、移动终端及可读存储介质
CN111523402B (zh) * 2020-04-01 2023-12-12 车智互联(北京)科技有限公司 一种视频处理方法、移动终端及可读存储介质
CN111954055B (zh) * 2020-07-01 2022-09-02 北京达佳互联信息技术有限公司 视频特效的展示方法、装置、电子设备及存储介质
CN111954055A (zh) * 2020-07-01 2020-11-17 北京达佳互联信息技术有限公司 视频特效的展示方法、装置、电子设备及存储介质
CN111860286A (zh) * 2020-07-14 2020-10-30 艾伯资讯(深圳)有限公司 基于混合策略的暴力行为检测方法及***、存储介质
CN112418153A (zh) * 2020-12-04 2021-02-26 上海商汤科技开发有限公司 图像处理方法、装置、电子设备和计算机存储介质
CN112418153B (zh) * 2020-12-04 2024-06-11 上海商汤科技开发有限公司 图像处理方法、装置、电子设备和计算机存储介质
CN112950672A (zh) * 2021-03-03 2021-06-11 百度在线网络技术(北京)有限公司 确定关键点的位置的方法、装置和电子设备
CN112950672B (zh) * 2021-03-03 2023-09-19 百度在线网络技术(北京)有限公司 确定关键点的位置的方法、装置和电子设备
CN112989987A (zh) * 2021-03-09 2021-06-18 北京京东乾石科技有限公司 用于识别人群行为的方法、装置、设备以及存储介质
CN113379877B (zh) * 2021-06-08 2023-07-28 北京百度网讯科技有限公司 人脸视频生成方法、装置、电子设备及存储介质
CN113379877A (zh) * 2021-06-08 2021-09-10 北京百度网讯科技有限公司 人脸视频生成方法、装置、电子设备及存储介质
CN114630012A (zh) * 2022-03-11 2022-06-14 北京奇艺世纪科技有限公司 一种虚拟试衣视频生成方法、装置、电子设备及介质
CN114630012B (zh) * 2022-03-11 2024-03-12 北京奇艺世纪科技有限公司 一种虚拟试衣视频生成方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN108229282A (zh) 2018-06-29

Similar Documents

Publication Publication Date Title
WO2018202089A1 (zh) 关键点检测方法、装置、存储介质及电子设备
US10909380B2 (en) Methods and apparatuses for recognizing video and training, electronic device and medium
US11367313B2 (en) Method and apparatus for recognizing body movement
CN109584276B (zh) 关键点检测方法、装置、设备及可读介质
US20220051417A1 (en) Target recognition method and appartus, storage medium, and electronic device
WO2018121737A1 (zh) 关键点预测、网络训练及图像处理方法和装置、电子设备
WO2018019126A1 (zh) 视频类别识别方法和装置、数据处理装置和电子设备
WO2018153323A1 (zh) 用于检测视频中物体的方法、装置和电子设备
CN110853033B (zh) 基于帧间相似度的视频检测方法和装置
US10699431B2 (en) Method and apparatus for generating image generative model
WO2020062493A1 (zh) 图像处理方法和装置
CN109583391B (zh) 关键点检测方法、装置、设备及可读介质
CN109255767B (zh) 图像处理方法和装置
US20220309781A1 (en) Multi-Angle Object Recognition
US20150286853A1 (en) Eye gaze driven spatio-temporal action localization
CN113015978B (zh) 处理图像以定位新颖对象
CN109977824B (zh) 物品取放识别方法、装置及设备
CN108229494B (zh) 网络训练方法、处理方法、装置、存储介质和电子设备
CA3052846A1 (en) Character recognition method, device, electronic device and storage medium
US11741712B2 (en) Multi-hop transformer for spatio-temporal reasoning and localization
JP2020013553A (ja) 端末装置に適用される情報生成方法および装置
CN114511041A (zh) 模型训练方法、图像处理方法、装置、设备和存储介质
CN113378834A (zh) 目标检测方法、装置、设备、存储介质以及程序产品
Lupión et al. 3D Human Pose Estimation from multi-view thermal vision sensors
Stovall et al. Scalable object tracking in smart cities

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18794124

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18794124

Country of ref document: EP

Kind code of ref document: A1