CN112449249A - Video stream processing method and device, electronic equipment and storage medium - Google Patents

Video stream processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112449249A
CN112449249A CN202011320192.6A CN202011320192A CN112449249A CN 112449249 A CN112449249 A CN 112449249A CN 202011320192 A CN202011320192 A CN 202011320192A CN 112449249 A CN112449249 A CN 112449249A
Authority
CN
China
Prior art keywords
image
video stream
processed
pixel point
point region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011320192.6A
Other languages
Chinese (zh)
Inventor
区善仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TetrasAI Technology Co Ltd
Original Assignee
Shenzhen TetrasAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TetrasAI Technology Co Ltd filed Critical Shenzhen TetrasAI Technology Co Ltd
Priority to CN202011320192.6A priority Critical patent/CN112449249A/en
Publication of CN112449249A publication Critical patent/CN112449249A/en
Priority to PCT/CN2021/086237 priority patent/WO2022105097A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The application discloses a video stream processing method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a first video stream to be processed, and playing the first video stream to be processed; the first to-be-processed video stream comprises a first image; acquiring a second video stream to be processed; the second video stream to be processed comprises a second image, and the playing time of the first image is the same as the acquiring time of the second image; under the condition that an instruction that a first object in the second video stream to be processed is taken as a migration object is received, migrating the first object in a second image into the first image to obtain a third image; and playing the third image at the playing time of the first image.

Description

Video stream processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing a video stream, an electronic device, and a storage medium.
Background
Movies are a complex of modern technology and art. On the human movie history, there are numerous classical movie fragments. Most film fans have a dreams of the performance for these classical film segments, i.e. play the characters in the film segment by themselves. In the past, movie videos were typically captured by professionals using professional photography equipment in special settings. With the improvement of the shooting function of the mobile equipment, the mobile equipment can replace professional shooting equipment to a great extent. Therefore, the absence of special background is the biggest obstacle for general users to shoot movie videos. Therefore, how to blend the performances of the common users into the special background of the movie has great application value.
Disclosure of Invention
The application provides a video stream processing method and device, electronic equipment and a storage medium.
In a first aspect, a method for processing a video stream is provided, the method including:
acquiring a first video stream to be processed, and playing the first video stream to be processed; the first to-be-processed video stream comprises a first image;
acquiring a second video stream to be processed; the second video stream to be processed comprises a second image, and the playing time of the first image is the same as the acquiring time of the second image;
under the condition that an instruction that a first object in the second video stream to be processed is taken as a migration object is received, migrating the first object in the second image into the first image to obtain a third image;
and playing the third image at the playing time of the first image.
In this aspect, the video stream processing apparatus may migrate the first object in the second video stream to be processed into the first video stream to be processed while playing the first video stream to be processed, to complete the live migration of the first object.
With reference to any embodiment of the present application, the acquiring a first to-be-processed video stream includes:
acquiring a third video stream to be processed;
and in response to an erasing instruction for a second object in the third video stream to be processed, erasing the second object in the third video stream to be processed to obtain the first video stream to be processed.
With reference to any embodiment of the present application, before the obtaining of the second to-be-processed video stream, the method further includes:
acquiring a preview image and displaying the preview image under the condition of receiving an instruction for executing object migration processing;
in the case that an instruction for taking a first object in the second to-be-processed video stream as a migration object is received, migrating the first object in the second image into the first image to obtain a third image, where the instruction includes:
and under the condition that an instruction of taking the first object in the preview image as a migration object is received, migrating the first object in the second image into the first image to obtain the third image.
Based on the embodiment, the user can determine the migration object from the preview image, and the video stream processing device can further migrate the first object in the second image into the first image to obtain the third image.
With reference to any embodiment of the present application, the migrating the first object in the second image into the first image to obtain a third image includes:
performing feature extraction processing on the first object in the preview image to obtain semantic feature data of the first object;
segmenting a first pixel point region from the second image; the first pixel point region is a pixel point region of which the semantic information in the second image is matched with the information carried by the semantic feature data of the first object;
and transferring the first pixel point region to the first image to obtain the third image.
In this embodiment, the video stream processing apparatus can implement pixel level segmentation on the second image to obtain the first pixel region, which can improve the segmentation accuracy on the first image, and further improve the effect of transferring the first object in the second image to the first image.
With reference to any embodiment of the present application, the acquiring a first to-be-processed video stream includes: acquiring a third video stream to be processed; in response to detecting an erasing instruction for a second object in the third video stream to be processed, erasing the second object in the third video stream to be processed to obtain a first video stream to be processed, wherein the third video stream to be processed comprises a fourth image, and the first image is obtained by erasing the second object in the fourth image;
the migrating the first pixel point region to the first image to obtain the third image includes:
migrating the first pixel point region to a first position in the first image to obtain a third image; the first position is a position of the second object in the fourth image.
In this embodiment, the video stream processing apparatus may enable replacement of the second object in the fourth image with the first object in the second image by migrating the first pixel region to the first location in the first image.
With reference to any embodiment of the present application, before the migrating the first pixel point region to the first position in the first image to obtain the third image, the method further includes:
under the condition that a first matching degree between the size of the first pixel point region and the size of the second object in the fourth image does not exceed a size matching degree threshold, scaling the first pixel point region to obtain a second pixel point region, wherein a second matching degree between the size of the second pixel point region and the size of the second object in the fourth image exceeds the size matching degree threshold;
the moving the first pixel point region to the first position in the first image to obtain the third image includes:
and migrating the second pixel point region to the first position in the first image to obtain the third image.
By executing such an embodiment, the video stream processing apparatus can make the size of the first object in the third image more compatible with the size of the objects other than the first object in the third image, and make the third image more natural.
In combination with any embodiment of the present application, before the migrating the second pixel point region to the first position in the first image to obtain the third image, the method further includes:
adjusting the color tone of the second pixel point region to obtain a third pixel point region, wherein the color tone of the third pixel point region is matched with the color tone of the first image;
the moving the second pixel point region to the first position in the first image to obtain the third image includes:
and migrating the third pixel point region to the first position in the first image to obtain the third image.
By executing this embodiment, the video stream processing apparatus can make the tone of the second pixel point region in the third image more harmonize with the tone of the third image other than the first image, and make the third image more natural.
With reference to any embodiment of the present application, before the migrating the first pixel point region to the first position in the first image to obtain the third image, the method further includes:
erasing the second object in the fourth image to obtain a fifth image;
and adding a background pixel point region matched with the content in the fifth image at the first position in the fifth image to obtain the first image.
In this embodiment, the video stream processing apparatus may make the first image more natural by adding a background pixel region matching the content in the fifth image to a blank region of the fifth image to obtain the first image.
With reference to any embodiment of the present application, the third to-be-processed video stream includes a sixth image, and the playing time of the sixth image is earlier than the playing time of the fourth image;
the detecting an erasure instruction for a second object in the third pending video stream includes:
when detecting that a second object in a sixth image is taken as an erasing object, generating the erasing instruction;
erasing the second object in the fourth image to obtain a fifth image, including:
performing feature extraction processing on the sixth image to obtain semantic feature data of the second object;
determining a fourth pixel point region from the fourth image by performing semantic segmentation processing on the fourth image; the semantic information of the fourth pixel point region is matched with information carried by the semantic feature data of the second object;
and erasing the fourth pixel point region in the fourth image to obtain the fifth image.
By implementing the embodiment, the video stream processing device can improve the accuracy and speed of erasing the pixel point region matched with the second object in the fourth image.
In a second aspect, there is provided a video stream processing apparatus, comprising:
the first acquisition unit is used for acquiring a first video stream to be processed;
the playing unit is used for playing the first video stream to be processed; the first to-be-processed video stream comprises a first image;
the second acquisition unit is used for acquiring a second video stream to be processed; the second video stream to be processed comprises a second image, and the playing time of the first image is the same as the acquiring time of the second image;
the first processing unit is used for migrating a first object in a second image into a first image to obtain a third image under the condition that an instruction of taking the first object in the second video stream to be processed as a migration object is received;
the playing unit is used for playing the third image at the playing time of the first image.
With reference to any embodiment of the present application, the first obtaining unit is specifically configured to:
acquiring a third video stream to be processed;
in response to detecting an erasing instruction for a second object in the third video stream to be processed, erasing the second object in the third video stream to be processed to obtain the first video stream to be processed.
With reference to any embodiment of the present application, the first obtaining unit is further configured to obtain a preview image,
the video stream processing apparatus further includes:
a display unit configured to display the preview image;
a generation unit configured to generate a migration instruction regarding the first object as a migration object when it is detected that a user touches the first object in the preview image;
the first processing unit is specifically configured to:
and determining that a first object in the second video stream to be processed is taken as a migration object according to the migration instruction, and migrating the first object in the second image to the first image to obtain the third image.
With reference to any embodiment of the present application, the first processing unit is specifically configured to:
performing feature extraction processing on the first object in the preview image to obtain semantic feature data of the first object;
segmenting a first pixel point region from the second image; the first pixel point region is a pixel point region of which the semantic information in the second image is matched with the information carried by the semantic feature data of the first object;
and transferring the first pixel point region to the first image to obtain the third image.
With reference to any embodiment of the present application, the third to-be-processed video stream includes a fourth image, and the first image is obtained by erasing the second object in the fourth image;
the first processing unit is specifically configured to:
migrating the first pixel point region to a first position in the first image to obtain a third image; the first position is a position of the second object in the fourth image.
With reference to any one of the embodiments of the present application, the video stream processing apparatus further includes:
a second processing unit, configured to, before the first pixel region is migrated to the first position in the first image to obtain the third image, scale the first pixel region when a first matching degree between a size of the first pixel region and a size of the second object in the fourth image does not exceed a size matching degree threshold, to obtain a second pixel region, where a second matching degree between the size of the second pixel region and the size of the second object in the fourth image exceeds the size matching degree threshold;
the first processing unit is specifically configured to:
and migrating the second pixel point region to the first position in the first image to obtain the third image.
In combination with any embodiment of the present application, the second processing unit is further configured to, before the second pixel point region is migrated to the first position in the first image to obtain the third image, adjust a color tone of the second pixel point region to obtain a third pixel point region, where the color tone of the third pixel point region is matched with the color tone of the first image;
the first processing unit is specifically configured to:
and migrating the third pixel point region to the first position in the first image to obtain the third image.
In combination with any embodiment of the present application, the first obtaining unit is further configured to erase the second object in the fourth image to obtain a fifth image before the first pixel point region is migrated to the first position in the first image to obtain the third image;
and adding a background pixel point region matched with the content in the fifth image at the first position in the fifth image to obtain the first image.
With reference to any embodiment of the present application, the third to-be-processed video stream includes a sixth image, and the playing time of the sixth image is earlier than the playing time of the fourth image;
the first obtaining unit is specifically configured to:
generating an erasing instruction in response to the fact that when a second object in a sixth image is detected to be an erasing object, and performing feature extraction processing on the sixth image to obtain semantic feature data of the second object;
determining a fourth pixel point region from the fourth image by performing semantic segmentation processing on the fourth image; the semantic information of the fourth pixel point region is matched with information carried by the semantic feature data of the second object;
and erasing the fourth pixel point region in the fourth image to obtain the fifth image.
In a third aspect, an electronic device is provided, which includes: a processor and a memory for storing computer program code comprising computer instructions, the electronic device performing the method of the first aspect and any one of its possible implementations as described above, if the processor executes the computer instructions.
In a fourth aspect, another electronic device is provided, including: a processor, transmitting means, input means, output means, and a memory for storing computer program code comprising computer instructions, which, when executed by the processor, cause the electronic device to perform the method of the first aspect and any one of its possible implementations.
In a fifth aspect, there is provided a computer-readable storage medium having stored therein a computer program comprising program instructions which, if executed by a processor, cause the processor to perform the method of the first aspect and any one of its possible implementations.
A sixth aspect provides a computer program product comprising a computer program or instructions which, when run on a computer, causes the computer to perform the method of the first aspect and any of its possible implementations.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flowchart of a video stream processing method according to an embodiment of the present application;
fig. 2a is a schematic diagram of an image in a movie fragment according to an embodiment of the present application;
fig. 2b is a schematic diagram of an image after an object is migrated according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a video stream processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic hardware structure diagram of a video stream processing apparatus according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more, "at least two" means two or three and three or more, "and/or" for describing an association relationship of associated objects, meaning that three relationships may exist, for example, "a and/or B" may mean: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" may indicate that the objects associated with each other are in an "or" relationship, meaning any combination of the items, including single item(s) or multiple items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural. The character "/" may also represent a division in a mathematical operation, e.g., a/b-a divided by b; 6/3 ═ 2. At least one of the following "or similar expressions.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
With the development of digital technology and the increase of network speed, the video industry has become one of the emerging industries with the most investment value at present. Videos also attract a lot of audiences due to the advantages of interest and civilian affinity. With the increasing power of mobile device capabilities, the commercial potential of video is also growing at a tremendous rate. Therefore, how to increase the entertainment of the video becomes very significant to improve the commercial value of the video.
Movies are a complex of modern technology and art. On the human movie history, there are numerous classical movie fragments. Most film fans have a dreams of the performance for these classical film segments, i.e. play the characters in the film segment by themselves. In the past, movie videos were typically captured by professionals using professional photography equipment in special settings. With the improvement of the shooting function of the mobile equipment, the mobile equipment can replace professional shooting equipment to a great extent. Therefore, the absence of special background is the biggest obstacle for general users to shoot movie videos.
In current methods, a performance video stream is obtained by recording a user's performance in a movie segment that mimics a movie character. And then, performing post-processing on the performance video stream, and intercepting the user from the performance video stream in a cutout mode. Meanwhile, the movie fragments are processed to erase the characters in the movie fragments, so as to obtain a movie video stream. And migrating the user intercepted from the performance video stream into the movie video stream to obtain a migrated video stream. In the migrated video stream, the user replaces the character in the movie fragment.
Although this method can complete the user's dream of performance by obtaining the migrated video stream, it takes much time to perform post-processing on the demonstration video stream, which results in inefficient migration. Based on this, the embodiment of the present application provides a technical solution to improve migration efficiency.
The execution subject of the embodiment of the present application is a video stream processing apparatus, where the video stream processing apparatus may be any electronic device that can execute the technical solution disclosed in the embodiment of the present application. Optionally, the video stream processing device may be one of the following: cell-phone, computer, server, panel computer.
It should be understood that the method embodiments of the present application may also be implemented by means of a processor executing computer program code. The embodiments of the present application will be described below with reference to the drawings.
The embodiments of the present application will be described below with reference to the drawings. Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video stream processing method according to an embodiment of the present disclosure.
101. And acquiring a first video stream to be processed, and playing the first video stream to be processed.
In this step, the first to-be-processed video stream may be a video stream including arbitrary content. For example, the first to-be-processed video stream may be a movie fragment; for another example, the first to-be-processed video stream may be a video stream shot by a mobile phone; as another example, the first to-be-processed video stream may be a video stream produced by video production software.
In one implementation of obtaining the first to-be-processed video stream, the video stream processing apparatus receives the first to-be-processed video stream input by the user through the input component to obtain the first to-be-processed video stream. Optionally, the input assembly includes: keyboard, mouse, touch screen, touch pad, audio input device, etc.
In another implementation manner of acquiring the first to-be-processed video stream, the video stream processing apparatus receives the first to-be-processed video stream sent by the terminal to acquire the first to-be-processed video stream. Optionally, the terminal may be any one of the following: cell-phone, computer, panel computer, server, wearable equipment.
In this embodiment, the first image is an image of any frame in the first to-be-processed video stream. For example, the first to-be-processed video stream includes two frames of images. Then, the first image may be a first frame image in the first video stream to be processed, and the first image may also be a second frame image in the first video stream to be processed.
In the embodiment of the application, the video stream processing device plays the first video stream to be processed after acquiring the first video stream to be processed.
In one possible implementation, the video stream processing apparatus includes a display. The video stream processing device plays the first video stream to be processed on the display.
In another possible implementation, a communication connection exists between the video stream processing apparatus and the display. The video stream processing device plays the first to-be-processed video stream on the display through the communication connection.
102. And acquiring a second video stream to be processed.
In the embodiment of the present application, the second video stream to be processed is obtained by the video stream processing device. Optionally, the video stream processing apparatus obtains the second video stream to be processed by using the camera while playing the first video stream to be processed.
In the embodiment of the present application, the second to-be-processed video stream includes a second image, where an acquisition time of the second image is the same as a playing time of the first image.
In one possible implementation, the video stream processing device includes a camera. The video stream processing device acquires a second to-be-processed video stream by acquiring the second to-be-processed video stream by using the camera.
For example, suppose that the video stream processing apparatus plays the first to-be-processed video stream at 9 point 50 minutes 2 seconds, the playing time length of each frame image in the first to-be-processed video stream is 1/10 seconds, and the first image is the third frame image of the first to-be-processed video stream. The playback time of the first image is 9 o' clock, 50 min 2.2 sec.
The acquisition time of the second image is the same as the playing time of the first image, that is, the video stream processing device acquires the second image by using the camera at 9 points, 50 minutes and 2.2 seconds.
In another possible implementation manner, the video stream processing apparatus obtains the second to-be-processed video stream by reading the second to-be-processed video stream from the external storage medium.
For example, suppose that the video stream processing apparatus plays the first to-be-processed video stream at 9 point 50 minutes 2 seconds, the playing time length of each frame image in the first to-be-processed video stream is 1/10 seconds, and the first image is the third frame image of the first to-be-processed video stream. The playback time of the first image is 9 o' clock, 50 min 2.2 sec.
The acquisition time of the second image is the same as the playing time of the first image, that is, the video stream processing apparatus reads the second image from the external storage medium at 9 o' clock 50 min 2.2 sec.
103. And in the case of receiving an instruction of taking a first object in the second video stream to be processed as a migration object, migrating the first object in the second image into the first image to obtain a third image.
In the implementation of the present application, a first object is selected from any one frame of image in the second video stream to be processed as a migration object, and the first object in the second video stream to be processed may be used as the migration object.
For example, the second to-be-processed video stream includes two frames of images. Wherein the first frame image includes an object a, and the second frame image includes an object a and an object b. If the object a in the first frame image is taken as a migration object of the second video stream to be processed, the object a in the second video stream to be processed is taken as the migration object; if the object a in the second frame image is taken as a migration object of the second video stream to be processed, the object a in the second video stream to be processed is taken as the migration object; and if the object b in the second frame image is taken as the migration object of the second video stream to be processed, taking the object b in the second video stream to be processed as the migration object.
In the embodiment of the present application, the first object may be any object. For example (example 1), in the second image, zhang is walking a dog in park a. At this time, the first object in the second image may be Zhang III, the first object in the second image may be a dog, and the first object in the second image may be a bench in park A.
In the embodiment of the present application, the transferring of the first object in the second image to the first image means that the pixel point region covered by the first object in the second image is transferred to the first image. Continuing the example following example 1, park B is in the first image. Under the condition that the second object is Zhang III, the pixel point area covered by Zhang III in the second image can be transferred to the first image to obtain a third image. Specifically, Zhangsan is migrated to the road of park B. Thus, in the third image, Zhang III walks on the road of park B.
In one possible implementation manner, the video stream processing device uses a deep learning model to migrate the first object in the second image into the first image to obtain a third image.
104. And playing the third image at the playing time of the first image.
The video stream processing apparatus replaces the first image with the third image during the playing of the first video stream to be processed by performing step 104.
It should be understood that the time taken for the video stream processing apparatus to perform "migrate a first object in a second image into a first image, resulting in a third image" is very short compared to the playback time length of the first image. For example, the playback time of the first image is 1/10 seconds, and the time taken for the video stream processing apparatus to obtain the third image is 10 seconds-6On the order of seconds.
Thus, from the user's perception of viewing the playback of the second pending video stream, the third image is played back without the first image. For example, the first to-be-processed video stream includes two frames of images, wherein the first image is the second frame of image. When the user watches the playing of the first video stream to be processed, the user feels that the third image is played after the first frame image of the first video stream to be processed is played.
It should be understood that, although the above embodiment only describes the implementation process of the video stream processing apparatus in the playing process of the first to-be-processed video stream, migrating the first object in the second image to the first image to obtain the third image, and playing the third image, in practical application, the video stream processing apparatus may migrate the first object in each image in the second to-be-processed video stream to the corresponding image in the first to-be-processed video stream within the playing time of the first to-be-processed video stream to obtain the migrated video stream, and play the migrated video stream within the playing time of the first to-be-processed video stream.
For example, the first to-be-processed video stream includes a first frame image, a second frame image, and a third frame image. The video stream processing device collects an image a in a second video stream to be processed at the playing time of the first frame image; the video stream processing device collects an image b in a second video stream to be processed at the playing time of a second frame image; and the video stream processing device collects the image c in the second video stream to be processed at the playing time of the third frame image. After acquiring an image a, a video stream processing device transfers a first object in the image a to a first frame image to obtain an image d; after acquiring an image b, the video stream processing device transfers a first object in the image b to a first frame image to obtain an image e; after acquiring the image c, the video stream processing device transfers the first object in the image c to the first frame image to obtain an image f.
The video streaming device plays the image d at the playing time of the first frame image, plays the image e at the playing time of the second frame image, and plays the image f at the playing time of the third frame image. Namely, the video stream processing device plays the video stream after the migration within the playing time of the first video stream, wherein the video stream after the migration comprises an image d, an image e and an image f.
Based on the technical scheme provided by the embodiment, the user can complete the real-time performance in the movie fragment through the video stream processing device. For example, assume that the video stream processing apparatus is a mobile phone. The Xiaoming takes the segment in the positive transmission of the Argan as a first video stream to be processed through the mobile phone, wherein the Argan in the first video stream to be processed is erased. In the process of shooting the video stream by the mobile phone camera, the Xiaoming simulates the performance of the Angan in the movie clip in front of the mobile phone camera. And the mobile phone records the small and obvious representation through the camera to obtain a second video stream to be processed. The mobile phone can obtain the video stream after the migration by migrating the twilight in the second video stream to be processed into the first video stream to be processed, and the mobile phone plays the video stream after the migration in the process of twilight performance.
As an alternative embodiment, the video stream processing apparatus obtains the first to-be-processed video stream by performing the following steps:
1. and acquiring a third video stream to be processed.
In the embodiment of the present application, the third to-be-processed video stream may be a video stream including any content. For example, the third to-be-processed video stream may be a movie fragment; for another example, the third to-be-processed video stream may be a video stream shot by a mobile phone; as another example, the third to-be-processed video stream may be a video stream produced by video production software.
In one implementation of obtaining the third to-be-processed video stream, the video stream processing apparatus receives the third to-be-processed video stream input by the user through the input component to obtain the third to-be-processed video stream.
In another implementation manner of acquiring the third to-be-processed video stream, the video stream processing apparatus receives the third to-be-processed video stream sent by the terminal to acquire the third to-be-processed video stream.
2. And in response to an erasing instruction for a second object in the third video stream to be processed, erasing the second object in the third video stream to be processed to obtain the first video stream to be processed.
In the embodiment of the present application, erasing an object in a video stream refers to erasing an object in each frame of image in the video stream. For example, video stream a includes image a and image b, where image a and image b each contain three sheets. Erasing page three in video stream a is erasing page three in image a and erasing page three in image b.
The erasing instruction for the second object in the third video stream to be processed refers to erasing the second object in each frame of image in the third video stream to be processed.
The video stream processing device erases the second object in the third video stream to be processed by executing step 1 and step 2, and obtains the first video stream to be processed. In this way, the video stream processing apparatus can replace the second object in the third video stream to be processed with the first object in the first video stream to be processed in real time by performing steps 101 to 104.
As an alternative embodiment, before executing step 102, the video stream processing apparatus further executes the following steps:
3. and acquiring a preview image and displaying the preview image.
In this embodiment, the object migration processing refers to migrating an object in one video stream to another video stream. Optionally, before the video stream processing apparatus acquires the second video stream to be processed, the video stream processing apparatus outputs information on whether to execute the object migration processing, so as to remind a user of whether to execute the object migration processing.
In one possible implementation manner, the video stream processing apparatus pops up a prompt box in the display interface, wherein the prompt box contains information whether to execute the object migration processing.
In another possible implementation manner, the video stream processing apparatus outputs information whether to execute the object migration process by voice.
In yet another possible implementation manner, the video stream processing apparatus outputs information on whether to execute the object migration process through the notification lamp.
The video stream processing apparatus acquires a preview image when receiving an execution object migration instruction. In one possible implementation, the video stream processing device includes a camera. And the video stream processing device starts the camera under the condition of receiving the execution object migration instruction, and acquires an image as a preview image by using the camera.
For example, the user inputs an execution object migration instruction to the video stream processing apparatus by clicking a button for executing the object migration processing in the display interface. The video stream processing device starts a camera under the condition of receiving the instruction, and the camera is used for collecting an image as a preview image; and the video stream processing device or under the condition of receiving the instruction, starting a camera to collect the preview video stream, and taking the image in the preview video stream as a preview image.
In another possible implementation manner, the video stream processing apparatus receives a preview image acquisition preview image imported from an external storage medium. For example, in a case where the video stream processing apparatus outputs information for requesting input of a preview image, the user can select one image from the external storage medium as the preview image to import into the video stream processing apparatus.
And after the video stream processing device acquires the preview image, displaying the preview image so that a user can select a migration object from the preview image.
4. And when the user is detected to touch the first object in the preview image, generating a migration instruction which takes the first object as a migration object.
In the embodiment of the present application, the first object is any one object in the preview image. After the video stream processing apparatus displays the preview image, the user may select the first object from the preview image as the migration object, and input an instruction to the video stream processing apparatus to take the first object in the preview image as the migration object, so that the video stream processing apparatus takes the first object in the second video stream to be processed as the migration object.
For example, in a case where the video stream processing apparatus displays a preview image on a touch display screen, when a user touches a first object in the preview image, the video stream processing apparatus generates a migration instruction that takes the first object as a migration object.
After generating a migration instruction to have the first object as a migration object, the video stream processing apparatus performs the following steps in executing step 103:
5. and determining that a first object in the second video stream to be processed is taken as a migration object according to the migration instruction, and migrating the first object in the second image into the first image to obtain the third image.
By performing steps 3 to 4, the user can determine the migration target from the preview image. The video stream processing device may further migrate the first object in the second image into the first image by performing step 5, resulting in a third image.
As an alternative embodiment, the video stream processing apparatus migrates the first object in the second image into the first image to obtain the third image by performing the following steps:
6. and performing feature extraction processing on the first object in the preview image to obtain semantic feature data of the first object.
In the embodiment of the application, the semantic feature data of the first object carries the identity information of the first object. For example, in the case of a first object being a person, the identity information of the first object comprises at least the following information: the apparel attributes of the person and the appearance characteristics of the person. For another example, in the case where the first object is a vehicle, the identity information of the first object includes at least the following information: vehicle type characteristics, vehicle body color, vehicle brand, and license plate.
In the embodiment of the application, the feature extraction processing can be realized through a deep learning model, and the deep learning model is trained by taking a plurality of images with labeling information as training data, so that the trained deep learning model can complete the feature extraction processing on the images. The annotation information of the images in the training data includes: identity information of objects in the image. In the process of training the deep learning model by using the training data, the deep learning model extracts the characteristic data of the image from the image and determines the identity information of the object in the image according to the characteristic data. And monitoring the result obtained by the deep learning model in the training process by taking the marking information as the monitoring information, updating the parameters of the deep learning model, and finishing the training of the deep learning model. In this way, the video stream processing device may perform feature extraction processing on the third image using the trained deep learning model to obtain semantic feature data of the first object.
7. And segmenting the first pixel point region from the second image.
In the embodiment of the application, the first pixel point region is a pixel point region in which semantic information in the second image is matched with information carried by semantic feature data of the first object. For example, if the first object is zhang san, the first pixel region is a pixel region in which the semantic information in the second image matches with the identity information of zhang san.
In a possible implementation manner, the video stream processing apparatus may segment the first pixel region from the second image by performing semantic segmentation processing on the second image.
Alternatively, step 6 and step 7 may be implemented by a segmentation model. The segmentation model is obtained by training a deep learning model using first training data. The first training data includes at least one pair of image pairs, each of the pair of image pairs including two images, one of the images including a segmentation object (which will be referred to as a segmentation reference image hereinafter), and the other image being an image to be segmented. For example, the image pair a includes an image a and an image B, and it is assumed that the image a is a segmentation reference image and the image B is an image to be segmented. In the training process, the deep learning model processes the image pair a, and a pixel point region matched with the segmentation object in the image A is segmented from the image B.
Each image pair in the first training data corresponds to one piece of labeling information, wherein the labeling information is a pixel point region matched with a segmentation object in the image A in the image B. And monitoring the segmentation result output by the deep learning model through the labeled information, updating the parameters of the deep learning model, and finishing the training of the deep learning model.
For example, the electronic device processes image a using a segmentation model to determine pixel point region B from image B. And obtaining the loss of the segmentation model based on the labeling information of the image pair a and the pixel point region b, and updating the parameters of the segmentation model based on the loss to finish the training of the segmentation model. It will be appreciated that in the case where steps 6 and 7 are implemented by a segmentation model, the inputs to the segmentation model are a preview image and a second image. The segmentation model can perform feature extraction processing on a first object in the preview image by processing the preview image and the second image to obtain semantic feature data of the first object, and segments the first pixel point region from the second image.
The video stream processing device can improve the speed of segmenting the pixel point region matched with the first object from the second image by implementing the step 6 and the step 7 through the segmentation model.
8. And migrating the first pixel point region to the first image to obtain the third image.
In a possible implementation manner, the video stream processing apparatus performs fusion processing on the first pixel region and the first image, so that the first pixel region covers a part of the pixel region of the first image, and the first pixel region is migrated to the first image to obtain a third image.
In another possible implementation manner, the video stream processing apparatus realizes that the first pixel region is migrated to the first image to obtain the third image by pasting the first pixel region in the first image.
Optionally, if the position of the first pixel point region in the second image is referred to as a reference position, the video stream processing device migrates the first pixel point region to the reference position in the first image, so as to obtain a third image.
The video stream processing device performs the steps 6 to 8 to realize the pixel level segmentation of the second image to obtain the first pixel region, so that the segmentation accuracy of the first image can be improved, and the effect of transferring the first image in the second image to the first image can be further improved.
As an alternative implementation, the third to-be-processed video stream includes a fourth image, and the first image is obtained by erasing the second object in the fourth image. The video stream processing apparatus performs the following steps in performing step 8:
9. and migrating the first pixel point region to a first position in the first image to obtain the third image.
In this embodiment of the application, the first position is a position of the second object in the fourth image, where the first position refers to a position of the fourth image in a pixel coordinate system.
Since the first image is obtained by erasing the second object in the fourth image, the video stream processing apparatus may replace the second object in the fourth image with the first object in the second image by migrating the first pixel region to the first position in the first image.
As an alternative embodiment, before performing step 9, the video stream processing apparatus further performs the following steps:
10. and under the condition that the first matching degree between the size of the first pixel point region and the size of the second object in the fourth image does not exceed a size matching degree threshold value, scaling the first pixel point region to obtain a second pixel point region.
Since the difference between the size of the first pixel point region and the reference size may be large, the video stream processing apparatus migrates the first pixel point region to the first position in the first image, which may cause the size of the first pixel point region to be inconsistent with the size of the object in the first image, where the reference size is the size of the pixel point region covered by the second object in the fourth image. Therefore, before the video stream processing apparatus migrates the first pixel region to the first position in the first image, the size of the first pixel region may be scaled, so that the size of the first pixel region is coordinated with the size of the object in the first image, thereby making the third image more natural.
In the embodiment of the application, the matching degree between the sizes of the pixel point areas covered by the two objects is used for judging whether the pixel point areas covered by the two objects are coordinated. Specifically, the higher the matching degree between the sizes of the pixel point regions covered by the two objects is, the more harmonious the pixel point regions covered by the two objects are represented.
The matching degree between the size of the first pixel point region and the size of the second object in the fourth image is referred to as a first matching degree. In a possible implementation manner, the first matching degree may be a ratio between a maximum length of the first pixel point region and a maximum length of the second object. For example, the maximum length of the first pixel region is 30 pixel units, and the maximum length of the second object is 20 pixel units. At this time, the matching degree between the size of the first pixel point region and the size of the second object in the fourth image is: 3/2.
In another possible implementation manner, the first matching degree may be a ratio between a maximum width of the first pixel point region and a maximum width of the second object. For example, the maximum width of the first pixel region is 8 pixel units, and the maximum width of the second object is 10 pixel units. At this time, the matching degree between the size of the first pixel point region and the size of the second object in the fourth image is: 4/5.
In this embodiment, the size matching degree threshold is a basis for determining whether the size of the first pixel region is coordinated with the size of the second object in the fourth image. Specifically, the first matching degree does not exceed the threshold representation of the size matching degree, and the size of the first pixel point region is inconsistent with the size of the second object in the fourth image; and the first matching degree exceeds a threshold value of the size matching degree for representing, and the size of the first pixel point area is coordinated with the size of the second object in the fourth image.
Therefore, under the condition that the first matching degree does not exceed the size matching degree threshold, the video stream processing device scales the first pixel point region to enable the first matching degree to exceed the size matching degree threshold, and even if the size of the first pixel point region is coordinated with the size of the second object in the fourth image, the second pixel point region is obtained. In this way, a second matching degree between the size of the second pixel point region and the size of the second object in the fourth image exceeds the size matching degree threshold.
After obtaining the second pixel region, the video stream processing apparatus executes the following steps in the process of executing step 9:
11. and migrating the second pixel point region to the first position in the first image to obtain the third image.
The video stream processing apparatus can make the size of the first object in the third image more harmonize with the size of the objects other than the first object in the third image by performing step 11, making the third image more natural.
As an alternative embodiment, the video stream processing apparatus further performs the following steps before performing step 11:
12. and adjusting the color tone of the second pixel point region to obtain a third pixel point region.
If the difference between the color tone of the second pixel point region and the color tone of the first image is larger than the difference between the color tone of the first image and the color tone of the first image, the third image is also uncoordinated and unnatural. Therefore, before the video stream processing device migrates the second pixel point region into the first image, the tone of the second pixel point region can be matched with the tone of the first image by adjusting the tone of the second pixel point region, so that the third image is more harmonious and natural.
In an implementation manner of obtaining the third pixel region, the video stream processing apparatus uses the color tone processing model to process the second pixel region and the first image, and adjusts the color tone of the second pixel region to match the color tone of the first image, so as to obtain the third pixel region.
The color tone processing model may be a convolutional neural network, and the convolutional neural network is trained by using a plurality of images as training data, so that the trained convolutional neural network adjusts the color tone of the images. The training data includes at least one pair of images, each pair of images including two images, one of which is an image to be tone-adjusted (which will be referred to as an image to be adjusted hereinafter) and the other of which is an image providing a tone standard (which will be referred to as a reference image hereinafter). For example, the image pair a includes an image a and an image B, and it is assumed that the image a is an image to be adjusted and the image B is a reference image. The convolutional neural network needs to adjust the hue of image a to that of image B by processing image a.
Each image pair in the training data corresponds to a supervised image, wherein the tone of the supervised image is the tone of image B. And monitoring the image output by the convolutional neural network through the monitoring image, updating the parameters of the convolutional neural network, and finishing the training of the convolutional neural network. For example, the electronic device processes image a using a tone processing model to obtain image b. And obtaining the loss of the tone processing model based on the supervision image and the image b of the image a, updating the parameters of the tone processing model based on the loss, and finishing the training of the tone processing model.
In this way, the video stream processing device can use the trained color tone processing model to process the second pixel point region so as to adjust the color tone of the second pixel point region, and obtain a third pixel point region.
After obtaining the third pixel region, the video stream processing apparatus performs the following steps in the process of performing step 11:
13. and migrating the third pixel point region to the first position in the first image to obtain the third image.
As an alternative embodiment, before executing step 9, in the process of erasing the second object in the third video stream to be processed to obtain the first video stream to be processed, the video stream processing apparatus obtains the first image by executing the following steps:
14. and erasing the second object in the fourth image to obtain a fifth image.
Optionally, the video stream processing apparatus may determine, by performing image segmentation processing on the fourth image, a pixel region covered by the second object from the fourth image, and remove the pixel region from the fourth image, to obtain a fifth image.
15. And adding a background pixel point region matched with the content in the fifth image at the first position in the fifth image to obtain the first image.
Because the fifth image is obtained by removing the pixel point region in the fourth image, a blank region exists in the fifth image. In this way, the user's experience of viewing the fifth image will be reduced. Further, if the third image is obtained by migrating the first object in the second image to the fifth image, there may be a blank area in the third image, which obviously also degrades the user's experience of viewing the third image.
Based on this, the video stream processing apparatus will also complement the blank area in the fifth image before migrating the first object in the second image to the fifth image. To match the complemented content with the fifth image, the video stream processing apparatus adds a background pixel region matching the content in the fifth image to the blank region of the fifth image.
For example, in the fourth image, zhang san is seated on the chair. In the case where the third image is the second object, the video stream processing apparatus obtains a fifth image by erasing the third image in the fourth image. At this time, the background pixel point region matched with the content in the fifth image is a chair. The video stream processing apparatus may complement the fifth image by adding the content to the blank area in the fifth image as a pixel area of the chair.
Optionally, in practical applications, if a video stream obtained by erasing the second object in each image in the third video stream to be processed is referred to as a fourth video stream to be processed, the video stream processing apparatus may obtain the first video stream to be processed by complementing each image in the fourth video stream to be processed.
As an optional implementation manner, the third to-be-processed video stream includes a sixth image, and the sixth image is an image that is played earlier than the fourth image in the third to-be-processed video stream. When the video stream processing device detects that the second object in the sixth image is taken as an erasing object, an erasing instruction for the second object in the third video stream to be processed is generated. And the second object is any one object in the sixth image.
The instruction to take the second object in the sixth image as the erasing object may be an operation in which the user clicks the second object in the sixth image. For example, in the case that the video stream processing apparatus plays the sixth image through the touch screen, the user may click the second object in the sixth image by clicking the touch screen; for another example, in a case where the video stream processing apparatus plays the sixth image, the user may click the second object in the sixth image by the mouse.
When the video stream processing apparatus receives an instruction to erase the second object in the sixth image, the video stream processing apparatus specifically executes the following steps in the process of step 14:
16. and performing feature extraction processing on the sixth image to obtain semantic feature data of the second object.
In this step, the semantic feature data of the second object carries the identity information of the second object. The feature extraction processing in this step is realized in the same manner as the feature extraction processing in step 6. And will not be described in detail herein.
17. And determining a fourth pixel point region from the fourth image by performing semantic segmentation processing on the fourth image.
In this embodiment of the application, the fourth pixel point region is a pixel point region where semantic information in the fourth image matches information carried by semantic feature data of the second object. For example, if the second object is liqi, the second pixel point region is a pixel point region in which semantic information in the fourth image matches with identity information of liqi.
In a possible implementation manner, the video stream processing apparatus may determine, by performing semantic segmentation processing on the fourth image, a pixel point in the fourth image that is to be matched with information carried by semantic feature data of the second object, so as to determine a fourth pixel point region from the fourth image.
18. And erasing the fourth pixel point region in the fourth image to obtain the fifth image.
By executing steps 16 to 18, the video stream processing apparatus may extract the semantic feature data of the second object from the sixth image, and may further perform pixel-level segmentation on the sixth image based on the semantic feature data of the second object, thereby determining a fourth pixel point region from the sixth image. Therefore, the video stream processing device erases the fourth pixel point region in the fourth image, and the accuracy of erasing the second object in the third video stream to be processed can be improved.
In one possible implementation, steps 16 and 17 may be implemented by the segmentation model described above. In the case where steps 15 and 16 are implemented by a segmentation model, the inputs of the segmentation model are the fourth image and the sixth image.
In another possible implementation, steps 16 to 18 may be implemented by an erasure model. The erasure model is obtained by training the deep learning model using the third training data. The third training data includes at least one pair of images, each pair of images including two images, one of which includes an erasing object (which will be referred to as an erasing reference image hereinafter), and the other of which is an image to be erased. For example, the image pair a includes an image a and an image B, and it is assumed that the image a is an erasing reference image and the image B is an image to be erased. In the training process, the deep learning model erases the pixel point region matched with the erasing object in the image A in the image B by processing the image a.
Each image pair in the third training data corresponds to one piece of labeling information, wherein the labeling information is a pixel point region matched with an erasing object in the image A in the image B. And monitoring the erasing result output by the deep learning model through the labeled information, updating the parameters of the deep learning model, and finishing the training of the deep learning model.
For example, the electronic device processes the image a by using an erasing model, and erases the pixel point region B in the image B. And obtaining the loss of the erasing model based on the labeling information of the image pair a and the position of the pixel point region B in the image B, and updating the parameters of the erasing model based on the loss to finish the training of the erasing model.
It should be understood that in the case where steps 16 to 18 are implemented by the erasure model, the input of the erasure model is the fourth image and the sixth image. And the erasing model obtains semantic feature data of the second object by performing feature extraction processing on the sixth image. And determining a fourth pixel point region from the fourth image by performing semantic segmentation processing on the fourth image. And erasing a fourth pixel point region in the fourth image to obtain a fifth image.
The video stream processing apparatus can increase the speed of erasing the pixel region in the fourth image that matches the second object by implementing steps 16 to 18 through the segmentation model.
Based on the technical scheme provided by the embodiment of the application, the embodiment of the application also provides a possible application scene. With the rise of short videos, more and more people use the short videos as an entertainment item. Based on the technical scheme provided by the embodiment of the application, the entertainment of the short video can be further improved.
As an alternative implementation, a person skilled in the relevant art may develop corresponding software based on the technical solutions provided in the embodiments of the present application. And the video stream to be selected is stored in the database of the software. The video streams to be selected are all video streams of erased objects, that is, the video streams to be selected can be used as the first video stream to be processed.
In the case where the user selects the video stream to be selected in the database as the first video stream to be processed, the video stream processing apparatus may output information to start shooting and performing, so that the user starts performing, and shoots the second video stream to be processed. The video stream processing apparatus can further obtain the video stream after the migration based on the technical scheme provided by the embodiment of the application.
In the case where the user selects an externally imported video stream as the first to-be-processed video stream, the video stream processing apparatus may prompt the user to select one object from the imported video stream as the second object. The video stream processing device further erases a second object in the imported video stream to obtain a first video stream to be processed. The video stream processing apparatus may output information to start shooting and performing to cause the user to start performing and shoot the second video stream to be processed. The video stream processing apparatus can further obtain the video stream after the migration based on the technical scheme provided by the embodiment of the application.
For example, xiaoming has a strong preference for avantan to pass through the movie, often mimicking the performance of a character in the movie. Xiaoming can select a segment in the Agan Zhengzhuan from the mobile phone as a third video stream to be processed. And selecting a person from the third video stream to be processed as a second object. And the mobile phone erases the second object in the third video stream to be processed to obtain the first video stream to be processed. The xiaoming can imitate the performance of a certain character in the movie clip to the mobile phone and record to obtain a second video stream to be processed. And the mobile phone can migrate the small and clear performance from the second video stream to be processed to the first video stream to be processed based on the technical scheme to obtain the migrated video stream. In the migrated video stream, the twilight acts as if it were in a movie scene.
For example, fig. 2a shows a certain frame of image in a movie fragment, the character indicated by the arrow in fig. 2a is used as the second object of the movie fragment, and the second object in the movie fragment is erased, so as to obtain the erased movie fragment. Fig. 2b shows a second object of a certain frame of image in the erased movie fragment, wherein the playing time of the image shown in fig. 2b is later than that of the image shown in fig. 2 a. The character indicated by the arrow in fig. 2b is small (i.e. the first object).
The short video is processed based on the technical scheme provided by the embodiment of the application, so that the interestingness and the entertainment can be improved. Further, in the above example, the xiaoming can know the performance effect of the self according to the migrated video stream, and improve the performance skill of the self.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below. Referring to fig. 3, fig. 3 is a schematic structural diagram of a video stream processing apparatus according to an embodiment of the present application, where the video stream processing apparatus 1 includes: a first acquisition unit 11, a playback unit 12, a second acquisition unit 13, a first processing unit 14, a display unit 15, a generation unit 16, and a second processing unit 17, wherein:
a first obtaining unit 11, configured to obtain a first to-be-processed video stream;
a playing unit 12, configured to play the first to-be-processed video stream; the first to-be-processed video stream comprises a first image;
a second obtaining unit 13, configured to obtain a second video stream to be processed; the second video stream to be processed comprises a second image, and the playing time of the first image is the same as the acquiring time of the second image;
a first processing unit 14, configured to, when receiving an instruction to take a first object in the second video stream to be processed as a migration object, migrate the first object in a second image into the first image to obtain a third image;
the playing unit 12 is configured to play the third image at the playing time of the first image.
With reference to any embodiment of the present application, the first obtaining unit 11 is specifically configured to:
acquiring a third video stream to be processed;
and responding to an erasing instruction aiming at a second object in the third video stream to be processed, and erasing the second object in the third video stream to be processed to obtain the first video stream to be processed.
In combination with any embodiment of the present application, the first obtaining unit 11 is further configured to obtain a preview image,
the video stream processing apparatus 1 further includes:
a display unit 15 for displaying the preview image;
a generating unit 16 configured to generate a migration instruction regarding the first object as a migration object when it is detected that the user touches the first object in the preview image;
the first processing unit 14 is specifically configured to:
and determining that a first object in the second video stream to be processed is taken as a migration object according to the migration instruction, and migrating the first object in the second image to the first image to obtain the third image.
With reference to any embodiment of the present application, the first processing unit 14 is specifically configured to:
performing feature extraction processing on the first object in the preview image to obtain semantic feature data of the first object;
segmenting a first pixel point region from the second image; the first pixel point region is a pixel point region of which the semantic information in the second image is matched with the information carried by the semantic feature data of the first object;
and transferring the first pixel point region to the first image to obtain the third image.
With reference to any embodiment of the present application, the third to-be-processed video stream includes a fourth image, and the first image is obtained by erasing the second object in the fourth image;
the first processing unit 14 is specifically configured to:
migrating the first pixel point region to a first position in the first image to obtain a third image; the first position is a position of the second object in the fourth image.
With reference to any embodiment of the present application, the video stream processing apparatus 1 further includes:
a second processing unit 17, configured to, before the first pixel region is migrated to the first position in the first image to obtain the third image, scale the first pixel region to obtain a second pixel region when a first matching degree between a size of the first pixel region and a size of the second object in the fourth image does not exceed a size matching degree threshold, where a second matching degree between the size of the second pixel region and the size of the second object in the fourth image exceeds the size matching degree threshold;
the first processing unit 14 is specifically configured to:
and migrating the second pixel point region to the first position in the first image to obtain the third image.
In combination with any embodiment of the present invention, the second processing unit 17 is further configured to adjust a color tone of the second pixel region to obtain a third pixel region before the second pixel region is migrated to the first position in the first image to obtain the third image, where the color tone of the third pixel region is matched with the color tone of the first image;
the first processing unit 14 is specifically configured to:
and migrating the third pixel point region to the first position in the first image to obtain the third image.
With reference to any embodiment of the present application, the first obtaining unit 11 is further configured to erase the second object in the fourth image to obtain a fifth image before the first pixel point region is migrated to the first position in the first image to obtain the third image;
and adding a background pixel point region matched with the content in the fifth image at the first position in the fifth image to obtain the first image.
With reference to any embodiment of the present application, the third to-be-processed video stream includes a sixth image, and the playing time of the sixth image is earlier than the playing time of the fourth image;
the first obtaining unit 11 is specifically configured to:
generating an erasing instruction in response to the fact that when a second object in a sixth image is detected to be an erasing object, and performing feature extraction processing on the sixth image to obtain semantic feature data of the second object;
determining a fourth pixel point region from the fourth image by performing semantic segmentation processing on the fourth image; the semantic information of the fourth pixel point region is matched with information carried by the semantic feature data of the second object;
and erasing the fourth pixel point region in the fourth image to obtain the fifth image.
In this embodiment, the first obtaining unit 11 may be a data interface, the playing unit 12 may be a video playing chip, the second obtaining unit 13 may be a camera, the first processing unit 14 may be a processor, the display unit 15 may be a display, the generating unit 16 may be a processor, and the second processing unit 17 may be a graphics processor.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present application may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
Fig. 4 is a schematic hardware structure diagram of a video stream processing apparatus according to an embodiment of the present disclosure. The video stream processing device 2 comprises a processor 21, a memory 22, an input device 23, an output device 24. The processor 21, the memory 22, the input device 23 and the output device 24 are coupled by a connector, which includes various interfaces, transmission lines or buses, etc., and the embodiment of the present application is not limited thereto. It should be appreciated that in various embodiments of the present application, coupled refers to being interconnected in a particular manner, including being directly connected or indirectly connected through other devices, such as through various interfaces, transmission lines, buses, and the like.
The processor 21 may be one or more Graphics Processing Units (GPUs), and in the case that the processor 21 is one GPU, the GPU may be a single-core GPU or a multi-core GPU. Alternatively, the processor 21 may be a processor group composed of a plurality of GPUs, and the plurality of processors are coupled to each other through one or more buses. Alternatively, the processor may be other types of processors, and the like, and the embodiments of the present application are not limited.
Memory 22 may be used to store computer program instructions, as well as various types of computer program code for executing the program code of aspects of the present application. Alternatively, the memory includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or compact disc read-only memory (CD-ROM), which is used for associated instructions and data.
The input means 23 are for inputting data and/or signals and the output means 24 are for outputting data and/or signals. The input device 23 and the output device 24 may be separate devices or may be an integral device.
It is understood that, in the embodiment of the present application, the memory 22 may be used to store not only the related instructions, but also the related data, for example, the memory 22 may be used to store the first to-be-processed video stream acquired through the input device 23, or the memory 22 may also be used to store the third image obtained through the processor 21, and the like, and the embodiment of the present application is not limited to the data specifically stored in the memory.
It will be appreciated that fig. 4 shows only a simplified design of a video stream processing apparatus. In practical applications, the video stream processing apparatus may further include other necessary elements, including but not limited to any number of input/output devices, processors, memories, etc., and all video stream processing apparatuses that can implement the embodiments of the present application are within the protection scope of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It is also clear to those skilled in the art that the descriptions of the various embodiments of the present application have different emphasis, and for convenience and brevity of description, the same or similar parts may not be repeated in different embodiments, so that the parts that are not described or not described in detail in a certain embodiment may refer to the descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media that can store program codes, such as a read-only memory (ROM) or a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims (12)

1. A method for processing a video stream, the method comprising:
acquiring a first video stream to be processed, and playing the first video stream to be processed; the first to-be-processed video stream comprises a first image;
acquiring a second video stream to be processed; the second video stream to be processed comprises a second image, and the playing time of the first image is the same as the acquiring time of the second image;
under the condition that an instruction that a first object in the second video stream to be processed is taken as a migration object is received, migrating the first object in the second image into the first image to obtain a third image;
and playing the third image at the playing time of the first image.
2. The method of claim 1, wherein obtaining the first to-be-processed video stream comprises:
acquiring a third video stream to be processed;
in response to detecting an erasing instruction for a second object in the third video stream to be processed, erasing the second object in the third video stream to be processed to obtain the first video stream to be processed.
3. The method according to claim 1 or 2, wherein before the obtaining the second to-be-processed video stream, the method further comprises:
acquiring a preview image and displaying the preview image;
when detecting that a user touches the first object in the preview image, generating a migration instruction which takes the first object as a migration object;
in the case that an instruction for taking a first object in the second to-be-processed video stream as a migration object is received, migrating the first object in the second image into the first image to obtain a third image, where the instruction includes:
and determining that a first object in the second video stream to be processed is taken as a migration object according to the migration instruction, and migrating the first object in the second image to the first image to obtain a third image.
4. The method of claim 3, wherein the migrating the first object in the second image into the first image, resulting in a third image, comprises:
performing feature extraction processing on the first object in the preview image to obtain semantic feature data of the first object;
segmenting a first pixel point region from the second image; the first pixel point region is a pixel point region of which the semantic information in the second image is matched with the information carried by the semantic feature data of the first object;
and transferring the first pixel point region to the first image to obtain the third image.
5. The method according to claim 4, characterized in that, in the case where the claim cited in claim 4 includes the claim 2, the third pending video stream includes a fourth image, the first image being obtained by erasing the second object in the fourth image;
the migrating the first pixel point region to the first image to obtain the third image includes:
migrating the first pixel point region to a first position in the first image to obtain a third image; the first position is a position of the second object in the fourth image.
6. The method of claim 5, wherein before the migrating the first pixel region to the first location in the first image to obtain the third image, the method further comprises:
under the condition that a first matching degree between the size of the first pixel point region and the size of the second object in the fourth image does not exceed a size matching degree threshold, scaling the first pixel point region to obtain a second pixel point region, wherein a second matching degree between the size of the second pixel point region and the size of the second object in the fourth image exceeds the size matching degree threshold;
the moving the first pixel point region to the first position in the first image to obtain the third image includes:
and migrating the second pixel point region to the first position in the first image to obtain the third image.
7. The method of claim 6, wherein before migrating the second pixel region to the first location in the first image to obtain the third image, the method further comprises:
adjusting the color tone of the second pixel point region to obtain a third pixel point region, wherein the color tone of the third pixel point region is matched with the color tone of the first image;
the moving the second pixel point region to the first position in the first image to obtain the third image includes:
and migrating the third pixel point region to the first position in the first image to obtain the third image.
8. The method according to any one of claims 5 to 7, wherein before the moving the first pixel region to the first position in the first image to obtain the third image, the method further comprises:
erasing the second object in the fourth image to obtain a fifth image;
and adding a background pixel point region matched with the content in the fifth image at the first position in the fifth image to obtain the first image.
9. The method according to claim 8, wherein the third to-be-processed video stream comprises a sixth image, and the playing time of the sixth image is earlier than the playing time of the fourth image;
the detecting an erasure instruction for a second object in the third pending video stream includes:
when detecting that a second object in a sixth image is taken as an erasing object, generating the erasing instruction;
erasing the second object in the fourth image to obtain a fifth image, including:
performing feature extraction processing on the sixth image to obtain semantic feature data of the second object;
determining a fourth pixel point region from the fourth image by performing semantic segmentation processing on the fourth image; the semantic information of the fourth pixel point region is matched with information carried by the semantic feature data of the second object;
and erasing the fourth pixel point region in the fourth image to obtain the fifth image.
10. A video stream processing apparatus, characterized in that the video stream processing apparatus comprises:
the first acquisition unit is used for acquiring a first video stream to be processed;
the playing unit is used for playing the first video stream to be processed; the first to-be-processed video stream comprises a first image;
the second acquisition unit is used for acquiring a second video stream to be processed; the second video stream to be processed comprises a second image, and the playing time of the first image is the same as the acquiring time of the second image;
the first processing unit is used for migrating a first object in a second image into a first image to obtain a third image under the condition that an instruction of taking the first object in the second video stream to be processed as a migration object is received;
the playing unit is used for playing the third image at the playing time of the first image.
11. An electronic device, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any of claims 1 to 9.
12. A computer-readable storage medium, in which a computer program is stored, which computer program comprises program instructions which, if executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.
CN202011320192.6A 2020-11-23 2020-11-23 Video stream processing method and device, electronic equipment and storage medium Pending CN112449249A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011320192.6A CN112449249A (en) 2020-11-23 2020-11-23 Video stream processing method and device, electronic equipment and storage medium
PCT/CN2021/086237 WO2022105097A1 (en) 2020-11-23 2021-04-09 Video stream processing method and apparatus, and electronic device, storage medium and computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011320192.6A CN112449249A (en) 2020-11-23 2020-11-23 Video stream processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112449249A true CN112449249A (en) 2021-03-05

Family

ID=74738672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011320192.6A Pending CN112449249A (en) 2020-11-23 2020-11-23 Video stream processing method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112449249A (en)
WO (1) WO2022105097A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022105097A1 (en) * 2020-11-23 2022-05-27 深圳市慧鲤科技有限公司 Video stream processing method and apparatus, and electronic device, storage medium and computer program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719872A (en) * 2005-08-11 2006-01-11 上海交通大学 Film recreation system based on whole body replacing
CN101807393A (en) * 2010-03-12 2010-08-18 青岛海信电器股份有限公司 KTV system, implement method thereof and TV set
CN102187336A (en) * 2008-08-27 2011-09-14 欧洲航空防务和航天公司 Method for identifying an object in a video archive
CN103905824A (en) * 2014-03-26 2014-07-02 深圳先进技术研究院 Video semantic retrieval and compression synchronization camera system and method
CN105118082A (en) * 2015-07-30 2015-12-02 科大讯飞股份有限公司 Personalized video generation method and system
CN106792147A (en) * 2016-12-08 2017-05-31 天脉聚源(北京)传媒科技有限公司 A kind of image replacement method and device
CN108875494A (en) * 2017-10-17 2018-11-23 北京旷视科技有限公司 Video structural method, apparatus, system and storage medium
CN110490897A (en) * 2019-07-30 2019-11-22 维沃移动通信有限公司 Imitate the method and electronic equipment that video generates

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827258A (en) * 2010-02-05 2010-09-08 北京水晶石数字科技有限公司 Real-time image scratching and video monitoring and collection system
CN103177469A (en) * 2011-12-26 2013-06-26 深圳光启高等理工研究院 Terminal and method for synthesizing video
CN103533254A (en) * 2013-10-17 2014-01-22 上海基美文化传媒股份有限公司 Display screen using augmented-reality technology and control method thereof
CN108124194B (en) * 2017-12-28 2021-03-12 北京奇艺世纪科技有限公司 Video live broadcast method and device and electronic equipment
US11151747B2 (en) * 2019-03-20 2021-10-19 Kt Corporation Creating video augmented reality using set-top box
CN112449249A (en) * 2020-11-23 2021-03-05 深圳市慧鲤科技有限公司 Video stream processing method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719872A (en) * 2005-08-11 2006-01-11 上海交通大学 Film recreation system based on whole body replacing
CN102187336A (en) * 2008-08-27 2011-09-14 欧洲航空防务和航天公司 Method for identifying an object in a video archive
CN101807393A (en) * 2010-03-12 2010-08-18 青岛海信电器股份有限公司 KTV system, implement method thereof and TV set
CN103905824A (en) * 2014-03-26 2014-07-02 深圳先进技术研究院 Video semantic retrieval and compression synchronization camera system and method
CN105118082A (en) * 2015-07-30 2015-12-02 科大讯飞股份有限公司 Personalized video generation method and system
CN106792147A (en) * 2016-12-08 2017-05-31 天脉聚源(北京)传媒科技有限公司 A kind of image replacement method and device
CN108875494A (en) * 2017-10-17 2018-11-23 北京旷视科技有限公司 Video structural method, apparatus, system and storage medium
CN110490897A (en) * 2019-07-30 2019-11-22 维沃移动通信有限公司 Imitate the method and electronic equipment that video generates

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022105097A1 (en) * 2020-11-23 2022-05-27 深圳市慧鲤科技有限公司 Video stream processing method and apparatus, and electronic device, storage medium and computer program

Also Published As

Publication number Publication date
WO2022105097A1 (en) 2022-05-27

Similar Documents

Publication Publication Date Title
US10657652B2 (en) Image matting using deep learning
CN109618222B (en) A kind of splicing video generation method, device, terminal device and storage medium
CN105981368B (en) Picture composition and position guidance in an imaging device
CN109688451B (en) Method and system for providing camera effect
CN110085244B (en) Live broadcast interaction method and device, electronic equipment and readable storage medium
KR102488530B1 (en) Method and apparatus for generating video
CN106682632B (en) Method and device for processing face image
CN110189246B (en) Image stylization generation method and device and electronic equipment
CN111988658B (en) Video generation method and device
WO2019089097A1 (en) Systems and methods for generating a summary storyboard from a plurality of image frames
CN107801096A (en) Control method, device, terminal device and the storage medium of video playback
US11409794B2 (en) Image deformation control method and device and hardware device
CN108961267B (en) Picture processing method, picture processing device and terminal equipment
TW201438463A (en) Techniques for adding interactive features to videos
CN111429338B (en) Method, apparatus, device and computer readable storage medium for processing video
CN112527115A (en) User image generation method, related device and computer program product
CN106815803B (en) Picture processing method and device
CN106447756B (en) Method and system for generating user-customized computer-generated animations
CN108124170A (en) A kind of video broadcasting method, device and terminal device
CN112102157A (en) Video face changing method, electronic device and computer readable storage medium
CN112839223A (en) Image compression method, image compression device, storage medium and electronic equipment
CN113033677A (en) Video classification method and device, electronic equipment and storage medium
CN114372172A (en) Method and device for generating video cover image, computer equipment and storage medium
CN114170472A (en) Image processing method, readable storage medium and computer terminal
CN112449249A (en) Video stream processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40039505

Country of ref document: HK

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210305