CN112906551A - Video processing method and device, storage medium and electronic equipment - Google Patents

Video processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112906551A
CN112906551A CN202110179159.4A CN202110179159A CN112906551A CN 112906551 A CN112906551 A CN 112906551A CN 202110179159 A CN202110179159 A CN 202110179159A CN 112906551 A CN112906551 A CN 112906551A
Authority
CN
China
Prior art keywords
image
sample
example segmentation
video
ray
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110179159.4A
Other languages
Chinese (zh)
Inventor
江毅
孙培泽
袁泽寰
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202110179159.4A priority Critical patent/CN112906551A/en
Publication of CN112906551A publication Critical patent/CN112906551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to a video processing method, which makes full use of the association between video frame images, thereby achieving better video instance segmentation effect and improving video instance segmentation efficiency. The method comprises the following steps: acquiring a video to be processed; inputting the video to be processed into an example segmentation model to obtain an example segmentation result of each frame of image in the video to be processed, wherein the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.

Description

Video processing method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a video processing method and apparatus, a storage medium, and an electronic device.
Background
The video understanding can analyze the video content, capture key information in the video, and be widely applied to the technical fields of videos such as security monitoring, human behavior analysis, sports video explanation and the like. The video example segmentation is used as the basis of a video understanding task, and mainly comprises the steps of predicting a label of which one type each pixel point in a video frame image belongs to, distinguishing different individuals belonging to the same type, for example, a video comprises a plurality of different characters, and segmenting the plurality of different characters from the video frame image through the video example segmentation so as to provide a basis for further video understanding processes such as video target tracking and the like.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a video processing method, the method comprising:
acquiring a video to be processed;
inputting the video to be processed into an example segmentation model to obtain an example segmentation result of each frame of image in the video to be processed, wherein the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
In a second aspect, the present disclosure provides a video processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring a video to be processed;
the segmentation module is used for inputting the video to be processed into an example segmentation model so as to obtain an example segmentation result of each frame of image in the video to be processed, and the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.
Through the technical scheme, the example segmentation model can be used for determining the example segmentation result of the image according to the example segmentation result of the historical frame image before the image aiming at least one frame image in the video to be processed. Compared with a mode of carrying out independent instance segmentation on each frame image in the video, the method can make full use of the association between the video frame images, thereby achieving a better video instance segmentation effect. In addition, because the example segmentation is carried out by combining the example segmentation result corresponding to the historical video frame image before the video frame image, redundant example segmentation operation can be avoided, and the video example segmentation efficiency is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
fig. 1 is a flow chart illustrating a video processing method according to an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic diagram of pixel-by-pixel segmentation based on a rectangular coordinate system;
FIG. 3 is a schematic diagram of edge segmentation based on polar coordinates;
FIG. 4 is a diagram illustrating an example segmentation model in a method of video processing according to an exemplary embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating an example segmentation for a video processing method according to an exemplary embodiment of the present disclosure;
FIG. 6 is a block diagram illustrating a video processing device according to an exemplary embodiment of the present disclosure;
fig. 7 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. It is further noted that references to "a", "an", and "the" modifications in the present disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The video understanding can analyze the video content, capture key information in the video, and be widely applied to the technical fields of videos such as security monitoring, human behavior analysis, sports video explanation and the like. The video example segmentation is used as the basis of a video understanding task, and mainly comprises the steps of predicting a label of which one type each pixel point in a video frame image belongs to, distinguishing different individuals belonging to the same type, for example, a video comprises a plurality of different characters, and segmenting the plurality of different characters from the video frame image through the video example segmentation so as to provide a basis for further video understanding processes such as video target tracking and the like.
The inventor researches and finds that in the video processing process in the related art, the example segmentation is usually performed on each frame of image in the video, namely when the example segmentation is performed on a certain frame of image in the video, the example segmentation result corresponding to the previous frame of image in the frame of image is not fully utilized, and the optimal segmentation effect is difficult to achieve. Moreover, for a video frame image with temporal continuity, the contents of the video frame image are usually related to each other or have the same part, and if each frame image is subjected to independent instance segmentation, there is a high possibility that there are many redundant instance segmentation operations, which affects not only the instance segmentation efficiency but also the execution efficiency of the subsequent video understanding task.
In view of this, the present disclosure provides a video processing method, which performs example segmentation by combining example segmentation results corresponding to historical video frame images before the video frame images, and makes full use of the association between the video frame images, thereby achieving a better video example segmentation effect and improving the video example segmentation efficiency.
Fig. 1 is a flow chart illustrating a video processing method according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the video processing method includes:
step 101, obtaining a video to be processed.
For example, the obtaining of the to-be-processed video may be a video input by a user in response to a video input operation of the user, or may be a video shot by an image capturing device automatically obtained from the image capturing device after receiving a video instance segmentation instruction, and the like, which is not limited in this disclosure.
Step 102, inputting a video to be processed into an example segmentation model to obtain an example segmentation result of each frame of image in the video to be processed, wherein the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of a previous historical frame image of the image aiming at least one frame of image in the video to be processed.
In this way, the example segmentation model may be used to determine, for at least one frame of image in the video to be processed, an example segmentation result of the image according to the example segmentation result of the historical frame image before the image. Compared with a mode of carrying out independent instance segmentation on each frame image in the video, the method can make full use of the association between the video frame images, thereby achieving a better video instance segmentation effect. In addition, because the example segmentation is carried out by combining the example segmentation result corresponding to the historical video frame image before the video frame image, redundant example segmentation operation can be avoided, and the video example segmentation efficiency is improved.
In order to make the video processing method provided by the present disclosure more understandable to those skilled in the art, the above steps are exemplified in detail below.
The training process of the example segmentation model is first explained.
For example, the example segmentation model may be trained according to the sample image and the sample example segmentation result corresponding to the sample image. The sample instance segmentation result may be a pixel-by-pixel segmentation result based on a rectangular coordinate system, or may also be an edge segmentation result based on a polar coordinate, which is not limited in this disclosure. The pixel-by-pixel segmentation result based on the rectangular coordinate system can refer to fig. 2, and a large number of pixels need to be classified in the process, so that the example segmentation result comprises classification information of each pixel. The edge segmentation result based on polar coordinates can be referred to fig. 3, the process may emit a ray from the center, predict the angle and distance of the ray to the edge point, and finally connect the end points of the rays in sequence to obtain a closed graph as an example segmentation result, so that the example segmentation result may include the direction information and distance information of the rays.
In a possible manner, the training step of the example segmentation model may include: and extracting multiple frames of sample images which are continuous in time from the sample video, wherein each frame of sample image is marked with a sample instance segmentation result. And then, for at least one frame of sample image, determining a prediction example segmentation result of the sample image according to a sample example segmentation result of a historical frame sample image before the sample image and a feature map of the sample image, and calculating a loss function according to the prediction example segmentation result of the sample image and the sample example segmentation result of the sample image. And finally, adjusting parameters of the example segmentation model according to the calculation result of the loss function.
For example, the historical frame sample image may be a last frame sample image of the sample image, or may be two last frame sample images of the sample image, and the like, and this is not limited in this disclosure as long as the time position of the historical frame sample image in the video is before the sample image.
It should be understood that, for the first frame sample image, since there is no historical frame sample image before the sample image, the prediction example segmentation result of the first frame sample image may be that a plurality of sample rays emitted from a central point of an example to be segmented in the first frame image are randomly predicted, then a closed graph formed by connecting end points of the plurality of sample rays is used as the prediction example segmentation result of the first frame sample image, a loss function is calculated according to the prediction example segmentation result and the sample example segmentation result of the first frame sample image, and finally a parameter of the example segmentation model is adjusted according to the calculation result of the loss function.
Then, for each frame sample image except the first frame sample image, a predicted example segmentation result of the sample image can be determined according to a sample example segmentation result of a historical frame sample image before the sample image and a feature map of the sample image, so that a loss function is calculated according to the predicted example segmentation result and the corresponding sample example segmentation result, and training of an example segmentation model is achieved. The feature map may be an image obtained by vectorization according to the image feature of each pixel point in the sample image.
In a possible manner, the sample instance segmentation result may be a polar coordinate-based sample instance segmentation result, that is, the sample instance segmentation result includes sample ray information, and then sample instance segmentation adjustment information of the sample image based on polar coordinates may be determined first based on the sample instance segmentation result of the sample image of the historical frame and the feature map of the sample image. And then adjusting the first sample ray in the sample example segmentation result of the historical frame sample image based on the polar coordinates according to the sample example segmentation adjustment information to obtain a second sample ray. And finally, determining a prediction example segmentation result of the sample image based on the polar coordinates according to the second sample ray.
Illustratively, the sample instance segmentation adjustment information may be direction adjustment information and/or angle adjustment information for the first sample ray. In a possible manner, the sample instance segmentation adjustment information may be a sample instance segmentation adjustment ray. Considering that the content of each frame of image in the video is usually related to each other or has the same part, for example, each frame of image in the video represents different actions of the same person, the center point of the ray emitted in the example segmentation process is not changed, and the change is the length of the ray. Thus, the starting point of the sample instance split adjustment ray may be the end point of the first sample ray, and the direction may be the same as or opposite to the direction of the first sample ray. Thus, after the first sample ray is adjusted according to the sample instance segmentation adjustment information, the first sample ray can be changed along the direction of the central point or along the direction close to the central point.
After the first sample ray is adjusted according to the sample example segmentation adjustment information, that is, after the second sample ray is obtained, the prediction example segmentation result of the sample image based on the polar coordinates can be determined according to the second sample ray. For example, a closed graph formed by sequentially connecting end points of the second sample rays may be used as a prediction example segmentation result of the sample image based on the polar coordinates. Then, a loss function can be calculated according to the prediction instance segmentation result and a sample instance segmentation result labeled in advance in the sample image, so that parameters of the instance segmentation model are adjusted according to the calculation result of the loss function, and training of the instance segmentation model is achieved.
For example, a schematic of an example segmentation model is shown in FIG. 4. Referring to fig. 4, the example segmentation model includes a neural network, and the structure of the neural network may be set according to actual situations, which is not limited in the embodiments of the present disclosure. With continued reference to FIG. 4, the training process for the example segmentation model includes: and inputting the feature map of the sample image and a first sample ray corresponding to the historical frame sample image into a neural network, wherein the neural network can learn the input feature map and the features of the first sample ray and output a sample example segmentation adjustment ray. Then, the example segmentation model may superimpose the sample example segmentation adjustment ray on the first sample ray to obtain a second sample ray corresponding to the sample image, and determine a prediction example segmentation result of the sample image according to the second sample ray. Finally, the calculation of the loss function can be performed according to the prediction instance segmentation result and the sample instance segmentation result corresponding to the sample image, so as to adjust the parameters of the instance segmentation model. Meanwhile, the sample image may be used as a historical frame sample image of a next frame sample image, that is, the second sample ray corresponding to the sample image may be used in an example segmentation process of the next frame sample image. Through the continuous iterative training mode, parameters of the example segmentation model can be optimized, and the segmentation accuracy of the example segmentation model is improved.
By the mode, model training can be performed according to the sample example segmentation result of the historical frame image before the sample image, so that the trained example segmentation model can perform example segmentation on the image by combining the example segmentation result of the historical frame image before the image, a better example segmentation effect is achieved, and the example segmentation efficiency is improved. And the example segmentation model can be trained through the sample example segmentation result based on the polar coordinate, correspondingly, the trained example segmentation model can perform example segmentation based on the polar coordinate, and compared with a pixel-by-pixel segmentation mode based on a rectangular coordinate system, the example segmentation efficiency can be further improved, and the time delay between each frame of image in the example segmentation process is reduced.
The following describes a process of performing video instance segmentation on a video to be processed by the instance segmentation model trained in the above manner.
Illustratively, after the pending video is acquired, the pending video may be input into an instance segmentation model. It should be understood that each frame of image in the video to be processed has a chronological order, so that a video image sequence composed of a plurality of frames of images arranged in chronological order can be obtained according to the video to be processed. Therefore, the example segmentation model after the video to be processed is input into the training may also be a video image sequence corresponding to the video to be processed is input into the example segmentation model.
For example, the trained example segmentation model may determine, for at least one frame of image in the video to be processed, an example segmentation result of the image according to an example segmentation result of a historical frame of image before the image.
It has been explained above that, in a possible manner, the example segmentation model may be trained by the sample example segmentation results of the sample image and the sample image of the previous frame of the sample image, so that, in the application stage, the historical frame image before the image may be the previous frame image of the image. Therefore, in the process of segmenting the examples of the two frames of images, the segmentation result of the example of the previous frame can be used as the initial state when the example of the next frame is segmented, and the correlation between the two adjacent frames of images is fully utilized, so that the better video example segmentation effect is achieved, and the video example segmentation efficiency is improved.
In a possible manner, considering that there is no historical frame image before the first frame image, the trained example segmentation model may be used to send out a plurality of rays from a central point of an example to be segmented in the first frame image, and use a closed graph formed by connecting end points of the plurality of rays as an example segmentation result of the first frame image, for the first frame image in the video to be processed. It should be appreciated that in the training phase, the parameters of the example segmentation model are adjusted by the result after the loss function is calculated by the predicted example segmentation result of the first frame sample image and the polar coordinate-based sample example segmentation result, so that in the application phase, the example segmentation model can output a more accurate polar coordinate-based example segmentation result for the first frame image of the video to be processed.
In a possible mode, the example segmentation result of the historical frame image may include a first ray emitted from a center point of the example to be segmented, accordingly, for each frame image except the first frame in the video to be processed, example segmentation adjustment information of the image based on the polar coordinates may be determined according to the example segmentation result of the historical frame image based on the polar coordinates and the feature map of the image, then the first ray may be adjusted according to the example segmentation adjustment information to obtain a second ray emitted from the center point of the example to be segmented, and finally the example segmentation result of the image based on the polar coordinates may be determined according to the second ray.
Illustratively, the instance splitting adjustment information may be direction adjustment information and/or angle adjustment information for the first ray. In a possible manner, the example segmentation adjustment information may include an example segmentation adjustment ray whose starting point is the end point of the first ray, and whose direction is the same as or opposite to the direction of the first ray. Accordingly, adjusting the first ray according to the instance segmentation adjustment information may be: the example split adjusted rays are superimposed on the first ray such that the first ray varies in a direction away from the center point or in a direction closer to the center point.
It should be appreciated that in the training phase, the example segmentation model adjusts the first sample ray according to the sample example segmentation to obtain the second sample ray, thereby obtaining a predicted example segmentation result for the computation of the loss function, and adjusts the parameters of the example segmentation model according to the computation result of the loss function. Therefore, in the application stage, after the feature map of the image and the example segmentation result of the historical frame image are input into the trained example segmentation model, the example segmentation model can output a corresponding example segmentation adjustment ray, and the example segmentation adjustment ray can be understood as a correction amount of the ray based on the polar coordinates of the historical frame image to the ray based on the polar coordinates of the current frame image. Therefore, the ray based on the polar coordinates of the current frame image can be obtained by superposing the example segmentation adjusting ray on the ray based on the polar coordinates (namely the first ray) of the historical frame image, and the example segmentation result of the current frame image can be obtained.
For example, as shown in fig. 3, an example segmentation result of the historical frame image based on polar coordinates is obtained by an example segmentation model, an example segmentation adjustment ray shown by a dotted line in fig. 5 is obtained, the example segmentation adjustment ray is superimposed on a ray of the historical frame image based on polar coordinates shown by a solid line in fig. 5, a second ray of the current frame image based on polar coordinates is obtained, and an example segmentation result of the current frame image is obtained according to a closed graph formed by sequentially connecting end points of the second rays (shown in an irregular graph in fig. 5).
By the method, the image can be subjected to instance segmentation by combining the instance segmentation result of the historical frame image before the image, a better instance segmentation effect is achieved, and meanwhile, the instance segmentation efficiency is improved. In addition, the example segmentation can be performed based on the polar coordinates, and compared with a pixel-by-pixel segmentation mode based on a rectangular coordinate system, the example segmentation efficiency can be further improved, and the time delay between each frame of image in the example segmentation process is reduced.
Based on the same inventive concept, the disclosed embodiments also provide a video processing apparatus, which may become part or all of an electronic device through software, hardware, or a combination of both. Referring to fig. 6, the video processing apparatus 600 includes:
an obtaining module 601, configured to obtain a video to be processed;
a segmentation module 602, configured to input the video to be processed into an example segmentation model to obtain an example segmentation result of each frame image in the video to be processed, where the example segmentation model is configured to determine, for at least one frame image in the video to be processed, an example segmentation result of the image according to an example segmentation result of a previous historical frame image of the image.
Optionally, the example segmentation result of the historical frame image includes a first ray emitted from a central point of the example to be segmented, and the segmentation module 602 is configured to:
determining example segmentation adjustment information of the image based on the polar coordinates according to example segmentation results of the historical frame image based on the polar coordinates and a feature map of the image;
and adjusting the first ray according to the example segmentation adjustment information to obtain a second ray emitted from the center point of the example to be segmented, and determining an example segmentation result of the image based on polar coordinates according to the second ray.
Optionally, the instance splitting adjustment information includes an instance splitting adjustment ray, a starting point of the instance splitting adjustment ray is an end point of the first ray, a direction of the instance splitting adjustment ray is the same as or opposite to a direction of the first ray, and the splitting module 602 is configured to:
superimposing the example segmentation adjustment ray on the first ray such that the first ray varies in a direction away from the center point or in a direction closer to the center point.
Optionally, the historical frame image is an image of a frame previous to the image.
Optionally, the segmentation module 602 is configured to segment, by using the instance, a first frame image in the video to be processed, issue a plurality of rays from a center point of the instance to be segmented in the first frame image, and use a closed graph formed by connecting end points of the plurality of rays as an instance segmentation result of the first frame image.
Optionally, the apparatus further comprises the following module for training the instance segmentation model:
the training device comprises a first training module, a second training module and a third training module, wherein the first training module is used for extracting multiple frame sample images with continuous time from a sample video, and each frame sample image is marked with a sample instance segmentation result;
the second training module is used for determining a prediction example segmentation result of the sample image according to a sample example segmentation result of a historical frame sample image before the sample image and a feature map of the sample image aiming at least one frame sample image, and calculating a loss function according to the prediction example segmentation result of the sample image and the sample example segmentation result of the sample image;
and the third training module is used for adjusting the parameters of the example segmentation model according to the calculation result of the loss function.
Optionally, the second training module is configured to:
determining sample example segmentation adjustment information of the sample image based on the polar coordinates according to a sample example segmentation result of the historical frame sample image based on the polar coordinates and a feature map of the sample image;
adjusting a first sample ray in a sample example segmentation result of the historical frame sample image based on polar coordinates according to the sample example segmentation adjustment information to obtain a second sample ray;
determining a prediction instance segmentation result of the sample image based on polar coordinates according to the second sample ray.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Based on the same inventive concept, the disclosed embodiments also provide a computer readable medium, on which a computer program is stored, which when executed by a processing apparatus, implements the steps of any of the above-mentioned video processing methods.
Based on the same inventive concept, an embodiment of the present disclosure further provides an electronic device, including:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of any of the video processing methods described above.
Referring now to FIG. 7, shown is a schematic diagram of an electronic device 700 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the communication may be performed using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a video to be processed; inputting the video to be processed into an example segmentation model to obtain an example segmentation result of each frame of image in the video to be processed, wherein the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides, in accordance with one or more embodiments of the present disclosure, a video processing method, the method comprising:
acquiring a video to be processed;
inputting the video to be processed into an example segmentation model to obtain an example segmentation result of each frame of image in the video to be processed, wherein the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
Example 2 provides the method of example 1, the example segmentation result of the historical frame image includes a first ray emitted from a center point of an example to be segmented, and the determining the example segmentation result of the image according to the example segmentation result of the historical frame image before the image includes:
determining example segmentation adjustment information of the image based on the polar coordinates according to example segmentation results of the historical frame image based on the polar coordinates and a feature map of the image;
and adjusting the first ray according to the example segmentation adjustment information to obtain a second ray emitted from the center point of the example to be segmented, and determining an example segmentation result of the image based on polar coordinates according to the second ray.
Example 3 provides the method of example 2, the example segmentation adjustment information including an example segmentation adjustment ray whose starting point is an end point of the first ray, the example segmentation adjustment ray having a direction that is the same as or opposite to a direction of the first ray, the adjusting the first ray according to the example segmentation adjustment information including:
superimposing the example segmentation adjustment ray on the first ray such that the first ray varies in a direction away from the center point or in a direction closer to the center point.
Example 4 provides the method of any one of examples 1-3, the historical frame image being a previous frame image of the image, according to one or more embodiments of the present disclosure.
Example 5 provides the method of any one of examples 1 to 3, in accordance with one or more embodiments of the present disclosure, where the example segmentation model is configured to, for a first frame image in the video to be processed, issue a plurality of rays from a central point of an example to be segmented in the first frame image, and use a closed graph formed by connecting end points of the plurality of rays as an example segmentation result of the first frame image.
Example 6 provides the method of any one of examples 1-3, the training step of the example segmentation model including:
extracting multiple frames of sample images which are continuous in time from the sample video, wherein each frame of sample image is marked with a sample instance segmentation result;
for at least one frame of sample image, determining a prediction example segmentation result of the sample image according to a sample example segmentation result of a historical frame sample image before the sample image and a feature map of the sample image, and calculating a loss function according to the prediction example segmentation result of the sample image and the sample example segmentation result of the sample image;
and adjusting parameters of the example segmentation model according to the calculation result of the loss function.
Example 7 provides the method of example 6, the determining a prediction instance segmentation result of the sample image according to a sample instance segmentation result of a historical frame sample image before the sample image and a feature map of the sample image, including:
determining sample example segmentation adjustment information of the sample image based on the polar coordinates according to a sample example segmentation result of the historical frame sample image based on the polar coordinates and a feature map of the sample image;
adjusting a first sample ray in a sample example segmentation result of the historical frame sample image based on polar coordinates according to the sample example segmentation adjustment information to obtain a second sample ray;
determining a prediction instance segmentation result of the sample image based on polar coordinates according to the second sample ray.
Example 8 provides, in accordance with one or more embodiments of the present disclosure, a video processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring a video to be processed;
the segmentation module is used for inputting the video to be processed into an example segmentation model so as to obtain an example segmentation result of each frame of image in the video to be processed, and the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
Example 9 provides the apparatus of example 8, the instance segmentation result of the historical frame image including a first ray emanating from a center point of an instance to be segmented, the segmentation module to:
determining example segmentation adjustment information of the image based on the polar coordinates according to example segmentation results of the historical frame image based on the polar coordinates and a feature map of the image;
and adjusting the first ray according to the example segmentation adjustment information to obtain a second ray emitted from the center point of the example to be segmented, and determining an example segmentation result of the image based on polar coordinates according to the second ray.
Example 10 provides the apparatus of example 9, the example segmentation adjustment information including an example segmentation adjustment ray whose starting point is an end point of the first ray, whose direction is the same as or opposite to the direction of the first ray, the segmentation module to:
superimposing the example segmentation adjustment ray on the first ray such that the first ray varies in a direction away from the center point or in a direction closer to the center point.
Example 11 provides the apparatus of any one of examples 8-10, the historical frame image being a previous frame image of the image, according to one or more embodiments of the present disclosure.
Example 12 provides the apparatus of any one of examples 8 to 10, in accordance with one or more embodiments of the present disclosure, the segmentation module is configured to segment, by the instance, a first frame image in the video to be processed, issue a plurality of rays from a central point of an instance to be segmented in the first frame image, and use a closed graph formed by connecting end points of the plurality of rays as an instance segmentation result of the first frame image.
Example 13 provides the apparatus of any one of examples 8-10, further including means for training the instance segmentation model, in accordance with one or more embodiments of the present disclosure:
the training device comprises a first training module, a second training module and a third training module, wherein the first training module is used for extracting multiple frame sample images with continuous time from a sample video, and each frame sample image is marked with a sample instance segmentation result;
the second training module is used for determining a prediction example segmentation result of the sample image according to a sample example segmentation result of a historical frame sample image before the sample image and a feature map of the sample image aiming at least one frame sample image, and calculating a loss function according to the prediction example segmentation result of the sample image and the sample example segmentation result of the sample image;
and the third training module is used for adjusting the parameters of the example segmentation model according to the calculation result of the loss function.
Example 14 provides the apparatus of example 13, the second training module to:
determining sample example segmentation adjustment information of the sample image based on the polar coordinates according to a sample example segmentation result of the historical frame sample image based on the polar coordinates and a feature map of the sample image;
adjusting a first sample ray in a sample example segmentation result of the historical frame sample image based on polar coordinates according to the sample example segmentation adjustment information to obtain a second sample ray;
determining a prediction instance segmentation result of the sample image based on polar coordinates according to the second sample ray.
Example 15 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-7, in accordance with one or more embodiments of the present disclosure.
Example 16 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-7.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. A method of video processing, the method comprising:
acquiring a video to be processed;
inputting the video to be processed into an example segmentation model to obtain an example segmentation result of each frame of image in the video to be processed, wherein the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
2. The method according to claim 1, wherein the example segmentation result of the historical frame image comprises a first ray emitted from a center point of an example to be segmented, and the determining the example segmentation result of the image according to the example segmentation result of the historical frame image before the image comprises:
determining example segmentation adjustment information of the image based on the polar coordinates according to example segmentation results of the historical frame image based on the polar coordinates and a feature map of the image;
and adjusting the first ray according to the example segmentation adjustment information to obtain a second ray emitted from the center point of the example to be segmented, and determining an example segmentation result of the image based on polar coordinates according to the second ray.
3. The method of claim 2, wherein the instance splitting adjustment information comprises an instance splitting adjustment ray, wherein a starting point of the instance splitting adjustment ray is an end point of the first ray, wherein a direction of the instance splitting adjustment ray is the same as or opposite to a direction of the first ray, and wherein the adjusting the first ray according to the instance splitting adjustment information comprises:
superimposing the example segmentation adjustment ray on the first ray such that the first ray varies in a direction away from the center point or in a direction closer to the center point.
4. The method according to any one of claims 1 to 3, wherein the history frame image is a frame image previous to the image.
5. The method according to any one of claims 1 to 3, wherein the instance segmentation model is configured to, for a first frame image in the video to be processed, issue a plurality of rays from a center point of an instance to be segmented in the first frame image, and use a closed graph formed by connecting end points of the plurality of rays as an instance segmentation result of the first frame image.
6. The method according to any of claims 1-3, wherein the step of training the instance segmentation model comprises:
extracting multiple frames of sample images which are continuous in time from the sample video, wherein each frame of sample image is marked with a sample instance segmentation result;
for at least one frame of sample image, determining a prediction example segmentation result of the sample image according to a sample example segmentation result of a historical frame sample image before the sample image and a feature map of the sample image, and calculating a loss function according to the prediction example segmentation result of the sample image and the sample example segmentation result of the sample image;
and adjusting parameters of the example segmentation model according to the calculation result of the loss function.
7. The method according to claim 6, wherein determining the predicted instance segmentation result of the sample image according to the sample instance segmentation result of the sample image of the historical frame before the sample image and the feature map of the sample image comprises:
determining sample example segmentation adjustment information of the sample image based on the polar coordinates according to a sample example segmentation result of the historical frame sample image based on the polar coordinates and a feature map of the sample image;
adjusting a first sample ray in a sample example segmentation result of the historical frame sample image based on polar coordinates according to the sample example segmentation adjustment information to obtain a second sample ray;
determining a prediction instance segmentation result of the sample image based on polar coordinates according to the second sample ray.
8. A video processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a video to be processed;
the segmentation module is used for inputting the video to be processed into an example segmentation model so as to obtain an example segmentation result of each frame of image in the video to be processed, and the example segmentation model is used for determining the example segmentation result of the image according to the example segmentation result of the previous historical frame image of the image aiming at least one frame of image in the video to be processed.
9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 7.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.
CN202110179159.4A 2021-02-09 2021-02-09 Video processing method and device, storage medium and electronic equipment Pending CN112906551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110179159.4A CN112906551A (en) 2021-02-09 2021-02-09 Video processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110179159.4A CN112906551A (en) 2021-02-09 2021-02-09 Video processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112906551A true CN112906551A (en) 2021-06-04

Family

ID=76123100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110179159.4A Pending CN112906551A (en) 2021-02-09 2021-02-09 Video processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112906551A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372479A1 (en) * 2016-06-23 2017-12-28 Intel Corporation Segmentation of objects in videos using color and depth information
US20180300549A1 (en) * 2017-04-12 2018-10-18 Baidu Online Network Technology (Beijing) Co., Ltd. Road detecting method and apparatus
KR102004642B1 (en) * 2018-01-25 2019-07-29 극동대학교 산학협력단 Control Method for Radiography Training System
US20200320712A1 (en) * 2017-10-24 2020-10-08 Beijing Jingdong Shangke Information Technology Co., Ltd. Video image segmentation method and apparatus, storage medium and electronic device
US20200349875A1 (en) * 2018-07-02 2020-11-05 Beijing Baidu Netcom Science Technology Co., Ltd. Display screen quality detection method, apparatus, electronic device and storage medium
CN112084988A (en) * 2020-06-08 2020-12-15 深圳佑驾创新科技有限公司 Lane line instance clustering method and device, electronic equipment and storage medium
CN112330701A (en) * 2020-11-26 2021-02-05 山东师范大学 Tissue pathology image cell nucleus segmentation method and system based on polar coordinate representation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372479A1 (en) * 2016-06-23 2017-12-28 Intel Corporation Segmentation of objects in videos using color and depth information
US20180300549A1 (en) * 2017-04-12 2018-10-18 Baidu Online Network Technology (Beijing) Co., Ltd. Road detecting method and apparatus
US20200320712A1 (en) * 2017-10-24 2020-10-08 Beijing Jingdong Shangke Information Technology Co., Ltd. Video image segmentation method and apparatus, storage medium and electronic device
KR102004642B1 (en) * 2018-01-25 2019-07-29 극동대학교 산학협력단 Control Method for Radiography Training System
US20200349875A1 (en) * 2018-07-02 2020-11-05 Beijing Baidu Netcom Science Technology Co., Ltd. Display screen quality detection method, apparatus, electronic device and storage medium
CN112084988A (en) * 2020-06-08 2020-12-15 深圳佑驾创新科技有限公司 Lane line instance clustering method and device, electronic equipment and storage medium
CN112330701A (en) * 2020-11-26 2021-02-05 山东师范大学 Tissue pathology image cell nucleus segmentation method and system based on polar coordinate representation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ENZE XIE 等: "PolarMask: Single Shot Instance Segmentation with Polar Representation", 《ARXIV》, pages 1 - 7 *
贺贵朋 等: "《基于内容的视频编码与传输控制技术》", 30 April 2005, 武汉大学出版社, pages: 68 - 70 *

Similar Documents

Publication Publication Date Title
US11367313B2 (en) Method and apparatus for recognizing body movement
CN113313064A (en) Character recognition method and device, readable medium and electronic equipment
CN111784712B (en) Image processing method, device, equipment and computer readable medium
CN112907628A (en) Video target tracking method and device, storage medium and electronic equipment
CN110059623B (en) Method and apparatus for generating information
CN112561840A (en) Video clipping method and device, storage medium and electronic equipment
CN112418232B (en) Image segmentation method and device, readable medium and electronic equipment
CN115205305A (en) Instance segmentation model training method, instance segmentation method and device
CN112381717A (en) Image processing method, model training method, device, medium, and apparatus
CN111783626A (en) Image recognition method and device, electronic equipment and storage medium
CN111126159A (en) Method, apparatus, electronic device, and medium for tracking pedestrian in real time
CN113038176B (en) Video frame extraction method and device and electronic equipment
CN111783632B (en) Face detection method and device for video stream, electronic equipment and storage medium
CN112418054B (en) Image processing method, apparatus, electronic device, and computer readable medium
CN111915532B (en) Image tracking method and device, electronic equipment and computer readable medium
CN112258622A (en) Image processing method, image processing device, readable medium and electronic equipment
CN110765304A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN115086541B (en) Shooting position determining method, device, equipment and medium
CN113033552B (en) Text recognition method and device and electronic equipment
CN112418233B (en) Image processing method and device, readable medium and electronic equipment
CN114399696A (en) Target detection method and device, storage medium and electronic equipment
CN113705386A (en) Video classification method and device, readable medium and electronic equipment
CN112906551A (en) Video processing method and device, storage medium and electronic equipment
CN110084835B (en) Method and apparatus for processing video
CN114004229A (en) Text recognition method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination