CN113612959A

CN113612959A - Video call method and device, storage medium and electronic equipment

Info

Publication number: CN113612959A
Application number: CN202110839183.6A
Authority: CN
Inventors: 陈衡
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-11-05

Abstract

The application discloses a video call method, a video call device, a storage medium and electronic equipment, which relate to the technical field of Internet, and the method comprises the following steps: acquiring video playing data corresponding to a video played in a target terminal; acquiring video call data corresponding to a video call performed in the target terminal; performing fusion processing on the video playing data and the video call data to generate fused video data; and controlling the target terminal to play the adjusted video based on the fused video data, or updating the video call based on the fused video data. The video call effect is effectively improved, and call interest and user experience are improved.

Description

Video call method and device, storage medium and electronic equipment

Technical Field

The application relates to the technical field of internet, in particular to a video call method, a video call device, a storage medium and electronic equipment.

Background

As the hardware capability of the terminal becomes stronger, a situation that a user makes a call while watching a video on the same terminal often occurs. However, the video playing and the call are usually independent, both parties can generate visual and other sensory influences, and at present, a scheme for stopping partial function reduction and mutual influence of one or both parties in the video playing and the call exists, but the effect is limited, so that the video call effect is poor, the call interest is low, and the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a video call scheme, which can effectively improve the video call effect and improve the call interest and user experience.

In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:

according to one embodiment of the present application, a video call method includes: acquiring video playing data corresponding to a video played in a target terminal; acquiring video call data corresponding to a video call performed in the target terminal; performing fusion processing on the video playing data and the video call data to generate fused video data; and controlling the target terminal to play the adjusted video based on the fused video data, or updating the video call based on the fused video data.

In some embodiments of the present application, the fusing the video playing data and the video call data to generate fused video data includes: performing face image detection processing based on the video call data to extract face image data corresponding to a first object from the video call data; performing face image detection processing based on the video playing data to extract face image data corresponding to a second object from the video playing data; and fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data.

In some embodiments of the present application, the performing facial image detection processing based on the video call data to extract facial image data corresponding to a first object from the video call data includes: determining an object matched with a video scene in the video from objects participating in the video call as the first object; acquiring video call picture data corresponding to the first object from the video call data; and carrying out face image detection processing based on the video call picture data so as to extract face image data corresponding to the first object from the video call picture data.

In some embodiments of the present application, the performing facial image detection processing based on the video playing data to extract facial image data corresponding to a second object from the video playing data includes: performing face image detection processing based on the video playing data to determine all character objects in the video playing data; determining the role object matched with the first object from all the role objects as the second object; and extracting the face image data corresponding to the second object from the video playing data.

In some embodiments of the present application, the fused video data comprises first fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes: and replacing the face image data corresponding to the second object in the video playing data with the face image data corresponding to the first object to obtain the first fusion video data.

In some embodiments of the present application, the fused video data comprises first fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes: and superposing the facial image data corresponding to the first object on the facial image data corresponding to the second object in the video playing data to obtain the first fusion video data.

In some embodiments of the present application, the fused video data comprises second fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes: and replacing the face image data corresponding to the first object in the video call data with the face image data corresponding to the second object to obtain second fusion video data.

In some embodiments of the present application, the fused video data comprises second fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes: and overlaying the face image data corresponding to the second object on the face image data corresponding to the first object in the video call data to obtain second fusion video data.

In some embodiments of the present application, the fused video data comprises at least one of first fused video data and second fused video data; the controlling the target terminal to play the adjusted video based on the fused video data or to update the video call based on the fused video data includes: and controlling the target terminal to play the adjusted video based on the first fused video data, and controlling the target terminal to update at least one of the video calls based on the second fused video data.

According to one embodiment of the present application, a video call device includes: the first acquisition module is used for acquiring video playing data corresponding to a video played in a target terminal; the second acquisition module is used for acquiring video call data corresponding to a video call performed in the target terminal; the fusion module is used for carrying out fusion processing on the video playing data and the video call data to generate fused video data; and the control module is used for controlling the target terminal to play the adjusted video based on the fused video data or update the video call based on the fused video data.

According to another embodiment of the present application, a storage medium has stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of the embodiments of the present application.

According to another embodiment of the present application, an electronic device may include: a memory storing computer readable instructions; and a processor for reading the computer readable instructions stored in the memory to perform the methods of the embodiments.

In the embodiment of the application, video playing data corresponding to a video played in a target terminal is obtained; acquiring video call data corresponding to a video call performed in a target terminal; performing fusion processing on the video playing data and the video call data to generate fused video data; and the control target terminal plays the adjusted video based on the fused video data or updates the video call based on the fused video data.

In this way, the video playing data and the video call data are fused, and the two mutually independent video contents and the two mutually independent video call contents can be fused, so that the two mutually independent contents can be associated, the control target terminal plays the adjusted video based on the fused video data, or updates the video call based on the fused video data, the video call effect can be effectively improved, and the call interest and the user experience are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a schematic diagram of a system to which embodiments of the present application may be applied.

Fig. 2 shows a flow diagram of a video playback method according to an embodiment of the present application.

Fig. 3 shows a flowchart of a video call method in a scenario according to an embodiment of the present application.

Fig. 4 shows a block diagram of a video playback device according to an embodiment of the present application.

FIG. 5 shows a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.

FIG. 1 shows a schematic diagram of a system 100 to which embodiments of the present application may be applied. As shown in fig. 1, the system 100 may include a server 101, a first terminal 102, and a second terminal 103. The first terminal 102 and the second terminal 103 may be any computer devices, such as a computer, a mobile phone, a smart watch, and a home appliance. The server 101 may be a server cluster or a cloud service, etc. In one example, the first terminal 102 is a television.

The server 101 may be connected to the first terminal 102 and the second terminal 103 by wired or wireless connections.

The first terminal 102 and the second terminal 103 can perform a video call through the server 101, and the first terminal 102 and the second terminal 103 can play a video while performing the video call.

In an implementation manner of this example, taking the first terminal 102 as an example of a target terminal, the first terminal 102 or the server 101 obtains video playing data corresponding to a video played in the target terminal; acquiring video call data corresponding to a video call performed in a target terminal; performing fusion processing on the video playing data and the video call data to generate fused video data; and the control target terminal plays the adjusted video based on the fused video data or updates the video call based on the fused video data.

Fig. 2 schematically shows a flow chart of a video playback method according to an embodiment of the present application. The execution subject of the video playing method may be any device, such as the first terminal 102 or the server 101 shown in fig. 1.

As shown in fig. 2, the video playing method may include steps S210 to S240.

Step S210, video playing data corresponding to a video played in a target terminal is obtained;

step S220, acquiring video call data corresponding to a video call performed in a target terminal;

step S230, performing fusion processing on the video playing data and the video call data to generate fused video data;

step S240, the control target terminal plays the adjusted video based on the fused video data, or updates the video call based on the fused video data.

The following describes a specific process of each step performed when playing a video.

In step S210, video playing data corresponding to a video played in the target terminal is obtained.

In the embodiment of the present example, a video such as a tv show or a movie is played in the target terminal, and the user may perform an online video call with the user of another terminal while the target terminal is watching the video. The video playing data, i.e. the content data of the video, may be obtained from a video playing data buffer in the target terminal.

In step S220, video call data corresponding to a video call performed in the target terminal is acquired.

In the embodiment of the present example, the video call performed in the target terminal may include a call in which a video screen and an audio are simultaneously opened, or may include a call in which a video screen is closed and an audio is opened.

The video call data may include video call data of a plurality of users, that is, the video call data corresponding to the video call performed in the target terminal may include: using video call data corresponding to user a of the target terminal and using video call data corresponding to user B of the other terminal, it is understood that the other terminal may include at least 1 or at least 2.

Video call data video picture data and audio data, video picture data sent by the server to the target terminal (i.e. video call data corresponding to user B using other terminal) and video picture data generated by the target terminal (i.e. video call data corresponding to user B using other terminal) can be acquired in real time.

In step S230, the video playing data and the video call data are fused to generate fused video data.

In the embodiment of the present example, the video playing data and part of the data in the video call data are fused to generate first fused video data corresponding to the video playing data, which may be used to update a video played in a target terminal to obtain an adjusted video, and/or generate second fused video data corresponding to the video call data, which may be used to update a video picture or audio information of a video call.

In one embodiment, the video playing data and the video call data may be fused to generate fused video data when it is monitored that the user opens the "fusion mode". In one embodiment, when it is monitored that the video call conforms to a "predetermined call state", the video playing data and the video call data may be subjected to fusion processing to generate fused video data, where in one example, the predetermined call state may be monitoring that the video call is just opened or the video call is disconnected, and in one example, a call keyword may be extracted based on audio data in the video call, and a state of a user (such as happy or sad) conforms to the predetermined call state according to the call keyword, or facial image detection is performed based on video call picture data in the video call, facial image features may be extracted, and the state of the user conforms to the predetermined call state according to the facial image features.

The video playing data and the video call data are fused, and the two mutually independent video contents and the two mutually independent video call contents can be fused, so that the two mutually independent contents can be associated, and the video call effect is improved.

In one embodiment, fusing video playing data and the video call data to generate fused video data includes:

performing face image detection processing based on the video call data to extract face image data corresponding to the first object from the video call data; performing face image detection processing based on the video playing data to extract face image data corresponding to the second object from the video playing data; and performing fusion processing on the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain fused video data.

The face image detection process may recognize a face image corresponding region from the video call data by face image recognition so that the face image data can be acquired from the corresponding region, wherein the first object may be a user corresponding to another terminal in a call with the target terminal, or a user using the target terminal, or another biological object appearing in the video call.

The face image detection process may identify a face image corresponding region from the video playback data by face image recognition, so that the face image data may be acquired from the corresponding region, wherein the second object may be a character corresponding to the user or other biological object appearing in the video.

Then, based on the face image data corresponding to the first object and the face image data corresponding to the second object, the video playing data and the video call data are subjected to fusion processing to obtain fusion video data, and the face images in the two independent contents can be subjected to fusion association.

In one embodiment, performing a face image detection process based on video call data to extract face image data corresponding to a first object from the video call data includes:

determining an object matched with a video scene in a video from objects participating in a video call as a first object; acquiring video call picture data corresponding to a first object from the video call data; face image detection processing is performed based on the video call picture data to extract face image data corresponding to the first object from the video call picture data.

The object participating in the video call may include at least one, for example, the object participating in the video call may include user a who uses the target terminal, and user B who uses other terminals, which may include at least 1 or at least 2.

In one example, the object matching the video scene in the video is a user corresponding to the other terminal talking to the target terminal.

In one example, a dialog keyword may be extracted based on audio data in video playing data, a video scene in a video may be determined according to the dialog keyword, or video picture data in the video playing data may be subjected to image recognition, a scene object in the video may be extracted, and a video scene in the video may be determined according to the scene object. Wherein the video scene is for example a happy scene or a classical scene etc. Then, face image detection may be performed based on video call picture data in a video call, face image features may be extracted, a state of the user (e.g., happy or sad, etc.) may be determined according to the face image features, then, according to a preset matching table, a video scene matching the state of the user may be queried, and then, an object matching the video scene in the video may be determined from objects participating in the video call as the first object.

Then, the terminal corresponding to the first object or the video call data corresponding to the terminal corresponding to the first object may be determined, the video call picture data is obtained from the video call data to perform face image detection processing, and the face image data corresponding to the first object is extracted from the video call picture data.

In one embodiment, performing a facial image detection process based on video playing data to extract facial image data corresponding to a second object from the video playing data includes:

performing face image detection processing based on the video playing data to determine all role objects in the video playing data; determining a role object matched with the first object from all role objects as a second object; and extracting the face image data corresponding to the second object from the video playing data.

The method includes the steps of performing face image detection processing on each frame of video picture data in video playing data, extracting face image features according to the face image data, performing object recognition (such as face recognition or animal recognition) based on all the face image features, recognizing object information (which may include object type, age and other information) of all character objects, and performing face image detection processing based on video call picture data to extract face image features in the face image data corresponding to a first object from the video call picture data for object recognition to obtain object information of the first object.

Then, based on the object information of the first object being matched with the object information of each character object, a character object matched with the first object (for example, a character object having the greatest similarity with the object information of the first object) among all character objects may be determined as the second object.

Finally, face image data corresponding to the second object may be acquired from the face image data extracted by the face image detection processing.

In one embodiment, the fused video data comprises first fused video data; based on the face image data corresponding to the first object and the face image data corresponding to the second object, the video playing data and the video call data are fused to obtain fused video data, and the method comprises the following steps:

and replacing the face image data corresponding to the second object in the video playing data by using the face image data corresponding to the first object, or superposing the face image data corresponding to the first object on the face image data corresponding to the second object in the video playing data to obtain first fusion video data.

The face image data corresponding to the first object can be used for replacing the face image data corresponding to the second object in the video playing data in an equal proportion, the video playing data is converted into first fusion video data, or the face image data corresponding to the first object is superposed on the face image data corresponding to the second object in the video playing data according to a preset proportion, and the video playing data is converted into first fusion video data.

The face image data corresponding to the first object and the face image data corresponding to the second object are subjected to calculation comparison of face region coordinates, the face image data corresponding to the first object are subjected to equal-scale stretching or stretching alignment according to a preset scale to the face image data corresponding to the second object, and then replacement or superposition is performed.

In one embodiment, the fused video data comprises second fused video data; based on the face image data corresponding to the first object and the face image data corresponding to the second object, performing fusion processing on video playing data and the video call data to obtain fused video data, including:

and replacing the face image data corresponding to the first object in the video call data with the face image data corresponding to the second object, or overlaying the face image data corresponding to the second object on the face image data corresponding to the first object in the video call data to obtain second fusion video data.

The face image data corresponding to the second object can be used for replacing the face image data corresponding to the first object in the video call data in an equal proportion, the video call data is converted into second fusion video data, or the face image data corresponding to the second object is superposed on the face image data corresponding to the first object in the video call data according to a preset proportion, and the video call data is converted into second fusion video data.

The face image data corresponding to the second object and the face image data corresponding to the first object are subjected to calculation comparison of face region coordinates, the face image data corresponding to the second object are subjected to equal-scale stretching or stretching alignment according to a preset scale to the face image data corresponding to the first object, and then replacement or superposition is performed.

In step S240, the control target terminal plays the adjusted video based on the fused video data, or updates the video call based on the fused video data.

In the embodiment of the present example, the fused video data may include first fused video data corresponding to video playing data, and the target terminal plays the adjusted video based on the first fused video data.

The fused video data may include second fused video data corresponding to the video call data, and the target terminal may update the video call by updating a picture or an audio of the video call based on the second fused video data.

In one embodiment, the merging video data includes at least one of first merging video data and second merging video data, and the controlling target terminal plays the adjusted video based on the merging video data or updates the video call based on the merging video data includes:

and the control target terminal plays the adjusted video based on the first fusion video data, and updates at least one of the video calls based on the second fusion video data.

When the fused video data only comprises the first fused video data, the target terminal can be controlled to play the video based on the first fused video data, and meanwhile, the video call is carried out based on the original video call data. When the fused video data only comprises the second fused video data, the target terminal can be controlled to carry out video call based on the second fused video data, and simultaneously carry out video playing based on the original video playing data. When the fused video data comprises the first fused video data and the second fused video data, the target terminal can be controlled to play videos based on the first fused video data, and meanwhile, video calls are carried out based on the second fused video data. It can be understood that the first fused video data and the second fused video data can be controlled to be played in a combined manner according to requirements.

In this way, based on steps S210 to S240, the video playing data and the video call data are fused, and two mutually independent video contents and two mutually independent video call contents can be fused, so that the two mutually independent contents are associated, and the target terminal is controlled to play the adjusted video based on the fused video data or update the video call based on the fused video data, so that the video call effect can be effectively improved, and the call interest and the user experience can be improved.

Fig. 3 shows a flowchart of a video call method in a scenario according to an embodiment of the present application. In this scenario, taking the target terminal as an example of the smart television, the user a watches the video program (P1) on the smart television while performing a video call (i.e., a video call with the user B), video play data corresponding to the video played in the target terminal is video play data corresponding to the television program, and video call data corresponding to the video call performed in the target terminal is video call data corresponding to the video call performed on the smart television (i.e., a video call with the user B).

On the premise that the user A opens the function (i.e., opens the fusion mode, such as the face changing mode) of performing the video call by using the embodiment of the application on the smart television, the user A is watching the video program (P1), the other user B plays the video call to the user A, the user A can be reminded of having a message in the forms of a popup window and the like on the smart television, and after the user A connects the video call on the smart television, the video program (P1) continues to be played simultaneously.

Based on the embodiment of the application, fusion processing is carried out on video playing data and video call data to generate fused video data; and controlling the smart television to play the adjusted video based on the fused video data or to update the video call based on the fused video data.

Further, the face of the user B may be displayed at the position of the avatar of the person in the video program (P1). The positions of the head images of other persons (including tag added, male, female, etc.) in the video program (P1) of the face of the user B can be switched by a certain instruction. At this time, the user A can watch the video program and can have a video call with the user B.

Referring to fig. 3, the steps of the video call on the smart television may specifically include:

the method comprises the following steps: the user a watches the video program (P1), and turns on "face change mode".

Step two: judging whether a video communication picture (namely a picture of a video call) exists in the intelligent television, and if the video communication picture is judged to be temporarily absent, the video playing picture processing module normally plays a picture of a video program (P1); and if the user B initiates communication to the user A through the video communication module, the user A agrees to the communication, and the two users are successfully connected in communication, judging that a far-end video communication picture exists, and further executing the step three.

Step three: video playing data corresponding to a video (a video program (P1)) played in the smart television is acquired, and video call data corresponding to a video call performed in the smart television is acquired.

Specifically, video call data corresponding to a video call performed in the smart television is acquired, and specifically, video call picture data of the user B in the video call data (i.e., pull stream decoding user picture data (P2)) can be decoded by the video communication module and then sent to the face recognition module for processing.

The method comprises the steps of acquiring video playing data corresponding to a video (a video program (P1)) played in the smart television, and specifically informing the video playing picture processing module through the video communication module that the video playing data corresponding to the video program (P1) are also sent to the face recognition module for processing.

Step four: and performing fusion processing on the video playing data and the video call data to generate fused video data.

The fusion process may include:

a. performing face image detection processing based on the video call data to extract face image data corresponding to the user B from the video call data, specifically performing face recognition on video call picture data of the user B (i.e., pull stream decoding user picture data (P2)) by using a face recognition module, and segmenting the face image data corresponding to the user B;

b. the face image detection processing is performed based on the video playing data to extract the face image data corresponding to the second object (at least one object in the video program (P1)) from the video playing data, and specifically, the face image data corresponding to the second object (at least one object in the video program (P1)) can be segmented by performing face recognition on the video playing data through a face recognition module.

c. The face image data P2-1 corresponding to the user B (which may include 106 key points of the face, face region coordinates, gender parameters, the number of faces, etc., and the user B may include at least one) and the face image data P1-1 corresponding to the second object are obtained.

d. Sending the face image data P2-1 and the face image data P1-1 to a face image replacement calculation module, wherein the face image replacement calculation module proportionally expands the face image data P2-1 of the user B to the face image data P1-1 through calculation comparison of face region coordinates (the picture has the condition that the sizes of faces are not consistent), and then replaces the face image data P1-1 in video playing data with the face image data P2-1 of the user B, and the replacement mode can be divided into 2 processing modes as follows:

(1): the face image data P1-1 in the video playing data is replaced by the face image data P2-1, and first fusion video data P1-2 are obtained, namely the face data value to be replaced in the whole video playing data of the locally played video program (P1) is changed into the face data value of the user B.

(2): the face image data P2-1 is superposed on the face image data P1-1 in the video playing data to obtain first fused video data P1-2, namely, the coordinates of the face image data of the user B are aligned with the coordinates of the face image data P1-1 in the video playing data of the local playing video program (P1), and then the head portrait of the user B is superposed and synthesized into the video playing data in an image synthesis mode.

Step five: and controlling the smart television to play the adjusted video based on the fused video data, specifically sending the calculated final result image data (namely the first fused video data P1-2) to the video playing picture processing module through the face image replacement calculation module to continue playing, and displaying the input picture by the smart television.

The method comprises the following steps: and (5) refreshing the system picture, namely if the intelligent television is still in a face changing mode, repeating the steps.

In order to better implement the video playing method provided by the embodiment of the present application, the embodiment of the present application further provides a video playing device based on the video playing method. The meaning of the noun is the same as that in the video playing method, and specific implementation details can refer to the description in the method embodiment. Fig. 3 shows a block diagram of a video playback device according to an embodiment of the present application.

As shown in fig. 4, the video playing apparatus 300 may include a first obtaining module 310, a second obtaining module 320, a fusing module 330, and a control module 340.

The first obtaining module 310 may be configured to obtain video playing data corresponding to a video played in a target terminal; the second obtaining module 320 may be configured to obtain video call data corresponding to a video call performed in the target terminal; the fusion module 330 may be configured to perform fusion processing on the video playing data and the video call data to generate fused video data; the control module 340 may be configured to control the target terminal to play the adjusted video based on the fused video data, or update the video call based on the fused video data.

In one embodiment, the fusion module 330 includes: a first extraction unit configured to perform face image detection processing based on the video call data to extract face image data corresponding to a first object from the video call data; a second extraction unit, configured to perform face image detection processing based on the video playing data to extract face image data corresponding to a second object from the video playing data; and the object fusion unit is used for carrying out fusion processing on the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data.

In some embodiments of the present application, the first extraction unit is configured to: determining an object matched with a video scene in the video from objects participating in the video call as the first object; acquiring video call picture data corresponding to the first object from the video call data; and carrying out face image detection processing based on the video call picture data so as to extract face image data corresponding to the first object from the video call picture data.

In some embodiments of the present application, the second extraction unit is configured to: performing face image detection processing based on the video playing data to determine all character objects in the video playing data; determining the role object matched with the first object from all the role objects as the second object; and extracting the face image data corresponding to the second object from the video playing data.

In some embodiments of the present application, the fused video data comprises first fused video data; the object fusion unit is configured to: and replacing the face image data corresponding to the second object in the video playing data with the face image data corresponding to the first object to obtain the first fusion video data.

In some embodiments of the present application, the fused video data comprises first fused video data; the object fusion unit is configured to: and superposing the facial image data corresponding to the first object on the facial image data corresponding to the second object in the video playing data to obtain the first fusion video data.

In some embodiments of the present application, the fused video data comprises second fused video data; the object fusion unit is configured to: and replacing the face image data corresponding to the first object in the video call data with the face image data corresponding to the second object to obtain second fusion video data.

In some embodiments of the present application, the fused video data comprises second fused video data; the object fusion unit is configured to: and overlaying the face image data corresponding to the second object on the face image data corresponding to the first object in the video call data to obtain second fusion video data.

In some embodiments of the present application, the fused video data comprises at least one of first fused video data and second fused video data; the control module 340 is configured to: and controlling the target terminal to play the adjusted video based on the first fused video data, and controlling the target terminal to update at least one of the video calls based on the second fused video data.

In this way, based on the video playing device 300, the video playing data and the video call data are fused, and two mutually independent video contents and two mutually independent video call contents can be fused, so that the two mutually independent contents can be associated, the control target terminal plays the adjusted video based on the fused video data, or updates the video call based on the fused video data, the video call effect can be effectively improved, and the call interest and the user experience are improved.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In addition, an embodiment of the present application further provides an electronic device, where the electronic device may be a terminal or a server, as shown in fig. 5, which shows a schematic structural diagram of the electronic device according to the embodiment of the present application, and specifically:

the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 5 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby integrally monitoring the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user pages, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, so as to implement various functions, for example, the processor 401 may execute the following steps:

acquiring video playing data corresponding to a video played in a target terminal; acquiring video call data corresponding to a video call performed in the target terminal; performing fusion processing on the video playing data and the video call data to generate fused video data; and controlling the target terminal to play the adjusted video based on the fused video data, or updating the video call based on the fused video data.

In some embodiments of the present application, when performing fusion processing on the video playing data and the video call data to generate fused video data, the processor 401 may perform: performing face image detection processing based on the video call data to extract face image data corresponding to a first object from the video call data; performing face image detection processing based on the video playing data to extract face image data corresponding to a second object from the video playing data; and fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data.

In some embodiments of the present application, when performing the facial image detection process based on the video call data to extract facial image data corresponding to the first object from the video call data, the processor 401 may perform: determining an object matched with a video scene in the video from objects participating in the video call as the first object; acquiring video call picture data corresponding to the first object from the video call data; and carrying out face image detection processing based on the video call picture data so as to extract face image data corresponding to the first object from the video call picture data.

In some embodiments of the present application, when performing the facial image detection process based on the video playing data to extract facial image data corresponding to the second object from the video playing data, the processor 401 may perform: performing face image detection processing based on the video playing data to determine all character objects in the video playing data; determining the role object matched with the first object from all the role objects as the second object; and extracting the face image data corresponding to the second object from the video playing data.

In some embodiments of the present application, the fused video data comprises first fused video data; when the video playing data and the video call data are fused based on the face image data corresponding to the first object and the face image data corresponding to the second object, and the fused video data is obtained, the processor 401 may execute: and replacing the face image data corresponding to the second object in the video playing data with the face image data corresponding to the first object to obtain the first fusion video data.

In some embodiments of the present application, the fused video data comprises first fused video data; when the video playing data and the video call data are fused based on the face image data corresponding to the first object and the face image data corresponding to the second object, and the fused video data is obtained, the processor 401 may execute: and superposing the facial image data corresponding to the first object on the facial image data corresponding to the second object in the video playing data to obtain the first fusion video data.

In some embodiments of the present application, the fused video data comprises second fused video data; when the video playing data and the video call data are fused based on the face image data corresponding to the first object and the face image data corresponding to the second object, and the fused video data is obtained, the processor 401 may execute: and replacing the face image data corresponding to the first object in the video call data with the face image data corresponding to the second object to obtain second fusion video data.

In some embodiments of the present application, the fused video data comprises second fused video data; when the video playing data and the video call data are fused based on the face image data corresponding to the first object and the face image data corresponding to the second object, and the fused video data is obtained, the processor 401 may execute: and overlaying the face image data corresponding to the second object on the face image data corresponding to the first object in the video call data to obtain second fusion video data.

In some embodiments of the present application, the fused video data comprises at least one of first fused video data and second fused video data; when the target terminal is controlled to play the adjusted video based on the fused video data, or update the video call based on the fused video data, the processor 401 may execute: and controlling the target terminal to play the adjusted video based on the first fused video data, and controlling the target terminal to update at least one of the video calls based on the second fused video data.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by a computer program, which may be stored in a computer-readable storage medium and loaded and executed by a processor, or by related hardware controlled by the computer program.

To this end, the present application further provides a storage medium, in which a computer program is stored, where the computer program can be loaded by a processor to execute the steps in any one of the methods provided in the present application.

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the computer program stored in the storage medium can execute the steps in any method provided in the embodiments of the present application, the beneficial effects that can be achieved by the methods provided in the embodiments of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the embodiments that have been described above and shown in the drawings, but that various modifications and changes can be made without departing from the scope thereof.

Claims

1. A video call method, comprising:

acquiring video playing data corresponding to a video played in a target terminal;

acquiring video call data corresponding to a video call performed in the target terminal;

performing fusion processing on the video playing data and the video call data to generate fused video data;

and controlling the target terminal to play the adjusted video based on the fused video data, or updating the video call based on the fused video data.

2. The method according to claim 1, wherein the fusing the video playing data and the video call data to generate fused video data comprises:

performing face image detection processing based on the video call data to extract face image data corresponding to a first object from the video call data;

performing face image detection processing based on the video playing data to extract face image data corresponding to a second object from the video playing data;

and fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data.

3. The method according to claim 2, wherein the performing facial image detection processing based on the video call data to extract facial image data corresponding to the first object from the video call data comprises:

determining an object matched with a video scene in the video from objects participating in the video call as the first object;

acquiring video call picture data corresponding to the first object from the video call data;

and carrying out face image detection processing based on the video call picture data so as to extract face image data corresponding to the first object from the video call picture data.

4. The method according to claim 2, wherein performing facial image detection processing based on the video playing data to extract facial image data corresponding to a second object from the video playing data comprises:

performing face image detection processing based on the video playing data to determine all character objects in the video playing data;

determining the role object matched with the first object from all the role objects as the second object;

and extracting the face image data corresponding to the second object from the video playing data.

5. The method of claim 2, wherein the fused video data comprises first fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes:

and replacing the face image data corresponding to the second object in the video playing data with the face image data corresponding to the first object to obtain the first fusion video data.

6. The method of claim 2, wherein the fused video data comprises first fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes:

and superposing the facial image data corresponding to the first object on the facial image data corresponding to the second object in the video playing data to obtain the first fusion video data.

7. The method of claim 2, wherein the fused video data comprises second fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes:

and replacing the face image data corresponding to the first object in the video call data with the face image data corresponding to the second object to obtain second fusion video data.

8. The method of claim 2, wherein the fused video data comprises second fused video data; the fusing the video playing data and the video call data based on the face image data corresponding to the first object and the face image data corresponding to the second object to obtain the fused video data includes:

and overlaying the face image data corresponding to the second object on the face image data corresponding to the first object in the video call data to obtain second fusion video data.

9. The method of claim 1, wherein the merged video data comprises at least one of first merged video data and second merged video data;

the controlling the target terminal to play the adjusted video based on the fused video data or to update the video call based on the fused video data includes:

and controlling the target terminal to play the adjusted video based on the first fused video data, and controlling the target terminal to update at least one of the video calls based on the second fused video data.

10. A video call apparatus, comprising:

the first acquisition module is used for acquiring video playing data corresponding to a video played in a target terminal;

the second acquisition module is used for acquiring video call data corresponding to a video call performed in the target terminal;

the fusion module is used for carrying out fusion processing on the video playing data and the video call data to generate fused video data;

and the control module is used for controlling the target terminal to play the adjusted video based on the fused video data or update the video call based on the fused video data.

11. A storage medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any of claims 1 to 9.

12. An electronic device, comprising: a memory storing computer readable instructions; a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1 to 9.