CN112750186A

CN112750186A - Virtual image switching method and device, electronic equipment and storage medium

Info

Publication number: CN112750186A
Application number: CN202110069031.2A
Authority: CN
Inventors: 杨国基; 常向月; 刘云峰
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2021-05-04
Anticipated expiration: 2041-01-19
Also published as: CN112750186B

Abstract

The application discloses a virtual image switching method, a virtual image switching device, electronic equipment and a storage medium, and relates to the technical field of electronic equipment, wherein the virtual image switching method comprises the following steps: acquiring a real image of a displayed current frame and action intention of a target person in the real image; acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; and if the real image is not matched with the virtual image, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image. According to the method and the device, when the real image corresponding to the real person customer service in the video is the virtual image corresponding to the switching robot customer service, the transition image used for linking is displayed, so that the real image can be smoothly transited to the virtual image, a user cannot perceive the switching action, and the user experience is improved.

Description

Virtual image switching method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of electronic devices, and in particular, to a method and an apparatus for switching an avatar, an electronic device, and a storage medium.

Background

At present, the popularity of mobile terminal devices such as mobile phones and the like is higher and higher, and smart phones become essential personal belongings for people going out. With the rapid development of the mobile internet, various applications appear on the mobile terminal, and many of the applications can provide customer service functions for users, so that the users can perform services such as product consultation and the like through the customer service.

Generally, the customer service function provided for the user in the mobile terminal application usually includes two parts of visual robot customer service and manual customer service. In the face of some simple or common problems, the robot customer service can answer the user's problems, and some complex or special problems can be processed by switching to the manual customer service, so that the mutual switching between the manual customer service and the robot customer service is often involved when the customer service function is used.

However, in the current customer service video, the pictures of the robot customer service and the manual customer service are directly switched when being switched, and the two pictures are often not well connected, so that the picture switching is unnatural, and the user experience is reduced.

Disclosure of Invention

In view of the above problems, the present application provides an avatar switching method, an avatar switching apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides an avatar switching method, including: acquiring a real image of a displayed current frame and action intention of a target person in the real image; acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of a target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image; and if the transition image is matched with the virtual image, switching the transition image into a virtual video.

Further, before generating a transition image based on the real image and the virtual image and switching the real image into the transition image if the real image and the virtual image do not match, the method further includes: extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person; and if the similarity of the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value, determining that the real image is not matched with the virtual image.

Further, the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, the first feature point and the second feature point are the same feature point of the target person, and before determining that the real image and the virtual image do not match if the similarity between the first feature parameter and the second feature parameter is smaller than the similarity threshold, the method further includes: determining whether the distance between the first feature point and the second feature point is not less than a distance threshold; and if the distance between the first characteristic point and the second characteristic point is not less than the distance threshold, determining that the similarity between the first characteristic parameter and the second characteristic parameter is less than the similarity threshold.

Further, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, including: generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the third characteristic parameter and the first characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter; generating an interpolation image based on the third characteristic parameter; and generating a transition image based on the frame insertion image, and switching the real image into the transition image.

Further, generating an interpolated image based on the third feature parameter includes: taking the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, taking the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculating a difference value between the first similarity and the second similarity; and if the difference value between the first similarity and the second similarity is not less than the specified value, generating the frame interpolation image based on the third characteristic parameter.

Further, generating a transition image based on the interpolated frame image, comprising: if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold, determining the frame interpolation image as a first frame image of the transition image, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold; if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter; generating a target frame interpolation image based on the fourth characteristic parameter; a transition image is generated based on the interpolated frame image and the target interpolated frame image.

Further, generating the target frame interpolation image based on the fourth feature parameter includes: taking the similarity of the fourth characteristic parameter and the second characteristic parameter as a third similarity, taking the similarity of the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculating a difference value between the third similarity and the fourth similarity; and if the difference value between the third similarity and the fourth similarity is not less than the specified value, generating the target frame interpolation image based on the fourth characteristic parameter.

Further, generating a transition image based on the interpolated frame image and the target interpolated frame image, comprising: determining whether the similarity of the fourth characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value; and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame interpolation image as the last frame image of the transition image, and generating the transition image through the frame interpolation image and the target frame interpolation image.

Further, generating an interpolated image based on the third feature parameter includes: and inputting the third characteristic parameter into a pre-trained virtual image model, and acquiring an interpolation image corresponding to the third characteristic parameter.

Further, inputting a third feature parameter into the pre-trained avatar model, and acquiring an interpolated image corresponding to the third feature parameter, including: acquiring a currently recorded real video comprising a real image, and finely adjusting a pre-trained virtual image model through the real video; and inputting the third characteristic parameter into the finely adjusted virtual image model, and acquiring an interpolation image corresponding to the third characteristic parameter.

Further, before generating the interpolated image based on the third feature parameter, the method further includes: obtaining a sample image of a target person; extracting sample characteristic parameters and sample frame insertion images of the target person from the sample images; and inputting the sample characteristic parameters and the sample frame insertion images into a machine learning model for training to obtain a pre-trained virtual image model.

Further, before generating a transition image based on the real image and the virtual image and switching the real image into the transition image if the real image and the virtual image do not match, the method further includes: and determining whether the real image and the virtual image are matched through an optical flow method.

Further, before acquiring the virtual video corresponding to the action intention and the virtual image of the first frame of the virtual video, the method further comprises: determining whether a virtual video corresponding to the action intention exists; if yes, executing to acquire a virtual video corresponding to the action intention; if not, acquiring an answer template corresponding to the action intention; and extracting the characteristic parameters of the target person from the real image, and generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template.

Further, before acquiring the displayed real image of the current frame and the action intention of the target person in the real image, the method further comprises the following steps: determining whether the real image satisfies a switching condition when the real image including the target person is played; and if the real image meets the switching condition, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

Further, determining whether the real image satisfies a switching condition includes: determining whether a handover instruction is received; and if a switching instruction is received, determining that the played real image including the target person meets the switching condition.

In a second aspect, an embodiment of the present application provides an avatar switching apparatus, including: the device comprises a first acquisition module, a second acquisition module, a first switching module and a second switching module. Wherein: the first acquisition module is used for acquiring a displayed real image of the current frame and the action intention of a target person in the real image; the second acquisition module is used for acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; the first switching module is used for generating a transition image based on the real image and the virtual image and switching the real image into the transition image if the real image and the virtual image are not matched, wherein the transition image comprises an interpolation image of a target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image; the second switching module is used for switching the transition image into the virtual video if the transition image is matched with the virtual image.

Further, the avatar switching apparatus further includes:

the characteristic parameter extraction module is used for extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

And the matching determination module is used for determining that the real image is not matched with the virtual image if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold.

Further, the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, and the first feature point and the second feature point are the same feature point of the target person, and the avatar switching apparatus further includes:

and the distance judging module is used for determining whether the distance between the first characteristic point and the second characteristic point is not less than a distance threshold value.

And the similarity determining module is used for determining that the similarity between the first characteristic parameter and the second characteristic parameter is less than the similarity threshold if the distance between the first characteristic point and the second characteristic point is not less than the distance threshold.

Further, the first switching module includes:

and the third characteristic parameter submodule is used for generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the third characteristic parameter and the first characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter.

And the frame interpolation image generation submodule is used for generating a frame interpolation image based on the third characteristic parameter.

And the first switching submodule is used for generating a transition image based on the frame insertion image and switching the real image into the transition image.

Further, the frame interpolation image generation submodule is specifically configured to use the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, use the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculate a difference between the first similarity and the second similarity; and if the difference value between the first similarity and the second similarity is not less than the specified value, generating the frame interpolation image based on the third characteristic parameter.

Further, the first switching module further includes:

and the similarity comparison submodule is used for determining the frame interpolation image as the first frame image of the transition image if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold value, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value.

And the fourth characteristic parameter submodule is used for generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter.

And the target frame interpolation image generation submodule is used for generating a target frame interpolation image based on the fourth characteristic parameter.

And the transition image generation submodule is used for generating a transition image based on the interpolation frame image and the target interpolation frame image.

Further, the target frame interpolation image generation submodule is specifically configured to use the similarity between the fourth characteristic parameter and the second characteristic parameter as a third similarity, use the similarity between the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculate a difference between the third similarity and the fourth similarity; and if the difference value between the third similarity and the fourth similarity is not less than the specified value, generating the target frame interpolation image based on the fourth characteristic parameter.

Further, the transition image generation submodule is specifically configured to determine whether the similarity between the fourth feature parameter and the second feature parameter is smaller than a similarity threshold; and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame interpolation image as the last frame image of the transition image, and generating the transition image through the frame interpolation image and the target frame interpolation image.

Further, the frame interpolation image generation submodule is specifically configured to input the third feature parameter into a pre-trained avatar model, and acquire a frame interpolation image corresponding to the third feature parameter.

Further, the avatar switching apparatus further includes:

and the sample image acquisition module is used for acquiring a sample image of the target person.

And the sample extraction module is used for extracting the sample characteristic parameters and the sample frame insertion images of the target person from the sample images.

And the training module is used for inputting the sample characteristic parameters and the sample frame insertion images into the machine learning model for training to obtain a pre-trained virtual image model.

Further, the avatar switching apparatus further includes:

and the virtual video detection module is used for determining whether a virtual video corresponding to the action intention exists.

And the first execution module executes to acquire the virtual video corresponding to the action intention if the virtual video exists.

And the answer template acquisition module is used for acquiring an answer template corresponding to the action intention if the virtual video does not exist.

And the virtual video generation module is used for extracting the characteristic parameters of the target person from the real image and generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template.

Further, the avatar switching apparatus further includes:

and the switching detection module is used for determining whether the real image meets the switching condition when the real image comprising the target person is played.

And the second execution module is used for executing the acquisition of the displayed real image of the current frame and the action intention of the target person in the real image if the real image meets the switching condition.

Further, the handover detection module is specifically configured to determine whether a handover instruction is received; and if a switching instruction is received, determining that the played real image including the target person meets the switching condition.

In a third aspect, an embodiment of the present application provides an electronic device, which includes: memory, one or more processors, and one or more applications. Wherein the one or more processors are coupled with the memory. One or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more application programs configured to perform the method of the first aspect as described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which program code is stored, and the program code can be called by a processor to execute the method according to the first aspect.

According to the method, the device, the electronic equipment and the storage medium for switching the virtual image, the real image of the current frame and the action intention of the target person in the real image are obtained; acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of a target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image; if the transition image is matched with the virtual image, the transition image is switched to the virtual video, so that smooth and natural transition of the real image to the virtual image is realized, namely, when a user watches the customer service video, the artificial customer service picture is smoothly switched to the virtual customer service picture, and the user does not feel the switching process, so that the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic application environment diagram illustrating an avatar switching method according to a first embodiment of the present application.

Fig. 2 is a flowchart illustrating an avatar switching method according to a first embodiment of the present application.

Fig. 3 is a flowchart illustrating an avatar switching method according to a second embodiment of the present application.

Fig. 4 shows a schematic flowchart of S240 in an avatar switching method according to a second embodiment of the present application.

Fig. 5 is a flowchart illustrating an avatar switching method according to a third embodiment of the present application.

Fig. 6 shows a flowchart of S360 in an avatar switching method according to a third embodiment of the present application.

Fig. 7 is a flowchart illustrating S370 in an avatar switching method according to a third embodiment of the present application.

Fig. 8 shows a schematic flow diagram of S373 in the avatar switching method according to the third embodiment of the present application.

Fig. 9 shows a schematic flowchart of S374 in the avatar switching method according to the third embodiment of the present application.

Fig. 10 is a flowchart illustrating an avatar switching method according to a fourth embodiment of the present application.

Fig. 11 is a flowchart illustrating an avatar switching method according to a fifth embodiment of the present application.

Fig. 12 is a flowchart illustrating an avatar switching method according to a sixth embodiment of the present application.

Fig. 13 is a flowchart illustrating an avatar switching method according to a seventh embodiment of the present application.

Fig. 14 is a block diagram illustrating an avatar switching apparatus according to an eighth embodiment of the present application.

Fig. 15 is a block diagram of an electronic device for performing an avatar switching method according to an embodiment of the present application in a ninth embodiment of the present application.

Fig. 16 is a storage unit for storing or carrying a program code implementing an avatar switching method according to a tenth embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Along with the development of science and technology, the requirement of people for humanized experience in the use process of various intelligent products is gradually increased, and in the process of communicating with customer service, a user also hopes that the user can not only obtain the reply of characters or voice, but also can communicate in a more natural interaction mode similar to interpersonal communication in actual life. Therefore, the current intelligent products can communicate with the user by playing the video containing the virtual image of the robot customer service so as to meet the visual demands of the user.

In the process of actually using the customer service function, when the robot customer service meets an unanswered question, the robot customer service needs to be switched to the manual customer service to answer the question of the user, and after the manual customer service is answered, the robot customer service can be switched to continue communication.

However, the current switching method is usually to directly switch the current frame image displayed in the video to the virtual image corresponding to the robot customer service or the real image corresponding to the artificial customer service, so that in the process of switching the virtual image and the real image, if the difference between the character in the real image and the character in the virtual image is large, the image switching will make the user feel more obtrusive and unnatural, thereby reducing the user experience.

The inventor finds in research that if the characteristics of the action, expression, posture and the like of the virtual image of the customer service robot in the virtual image and the characteristics of the artificial customer service in the real image are kept consistent as much as possible in the switching process, the switching process of the two images can be more smooth, and the user experience can be improved.

However, in the actual research process, the inventor also finds that, because the virtual customer service robot frame and the real artificial customer service frame are always in a dynamically changing state, it is difficult to find a time when the frame of the customer service robot and the artificial customer service frame are synchronized, that is, a time when the virtual image of the customer service robot in the virtual image and the characteristics of the artificial customer service in the real image, such as motion, expression, posture, etc., are consistent, and particularly when the real image switches the virtual image, the real person displayed in the real image cannot predict the motion, expression, posture, etc., so that the virtual image is difficult to switch at the synchronized time.

In order to solve the above problem, the inventor proposes a method, an apparatus, an electronic device, and a storage medium for switching an avatar in an embodiment of the present application, which can display a transition image for linking when a real image corresponding to a real person customer service in a video is switched to a virtual image corresponding to a robot customer service, thereby ensuring that the real image can be smoothly transitioned to the virtual image, making a user unable to perceive a switching action, and improving user experience.

The following describes in detail an avatar switching method, an avatar switching apparatus, an electronic device, and a storage medium according to embodiments of the present application.

First embodiment

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The avatar switching method provided by the embodiment of the present application may be applied to the interactive system 100 shown in fig. 1. The interactive system 100 comprises a terminal device 101 and a server 102, wherein the server 102 is in communication connection with the terminal device 101. The server 102 may be a conventional server or a cloud server, and is not limited herein.

The terminal device 101 may be various electronic devices that have a display screen, a data processing module, a shooting camera, an audio input/output function, and the like, and support data input, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a projector, a desktop computer, a self-service terminal, a wearable electronic device, and the like. Specifically, the data input may be inputting voice based on a voice module provided on the electronic device, inputting characters based on a character input module, and the like.

The terminal device 101 may have a client application installed thereon, and the user may be based on the client application (for example, APP, wechat applet, etc.), wherein the conversation robot of this embodiment may also be a client application configured in the terminal device 101. A user may register a user account with server 102 based on a client application, and communicate with server 102 based on the user account, for example, a user logs into a user account at a client application, and makes input through the client application based on the user account, text information or voice information can be input, after the client application program receives the information input by the user, the information may be sent to the server 102, so that the server 102 may receive, process and store the information, and the server 102 may also receive the information and return a corresponding output information to the terminal device 101 according to the information, where the specifically output information may be a virtual video corresponding to a robot service for answering a customer question, which is stored in the server 102 in advance, or a real video corresponding to a real person service acquired by the server 102 in real time.

In some embodiments, the device for switching the avatar may also be disposed on the terminal device 101, so that the terminal device 101 may implement interaction with the user without relying on the server 102 to establish communication, at this time, the terminal device 101 may store a virtual video corresponding to the robot service and receive or collect a real video corresponding to the real person service in real time, and the interactive system 100 may only include the terminal device 101.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method can comprise the following steps:

s110, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

The real image may be an image shot by the terminal device through the camera in real time, or an image shot by other devices through the camera in real time and sent to the terminal device, such as video call, online video, and the like. The real image may or may not include a real person, for example, the real image may show an object that the real person is demonstrating.

The target person may be a real person in reality, and the information of the appearance, identity, and the like of the target person is determined. For example, in the real person customer service a and the real person customer service B, the target person may be the real person customer service a.

In some embodiments, the terminal device may display a real image including the target person, for example, the real image may include a real video of the target person, the real video including the target person may be played on a display screen of the terminal device, and the terminal device may take a current frame of a picture displayed by the real video as a real image of a current frame and acquire the real image of the current frame.

The action intention may be an intention of a target person in the real image to switch the real image displayed by the terminal device into the virtual image, for example, a real person customer service displayed by the device terminal wants to switch the current manual service into a service of a robot customer service after solving a problem of the user. For another example, the action intention may be an intention of a target person in the real image to switch the real image displayed by the terminal device to one of a plurality of virtual videos, and the plurality of virtual videos may include a virtual video for explaining the service a, a virtual video for explaining the service B, a virtual video for ending the conversation, and the like, as an example.

The virtual video may be a video generated in advance according to the character features of the target character, and the virtual video may be composed of a plurality of frames of virtual images.

As one way, when acquiring the action intention of the target person in the real image, the action intention may be determined by recognizing the gesture of the target person in the real image, specifically, the terminal device may store a mapping relationship between multiple gestures of the target person and multiple action intentions in advance, and when recognizing that the target person in the real image has made the gesture a, the action intention corresponding to the gesture a may be obtained.

As another mode, when the real image is a real video including audio information, the audio information may be extracted from the real video, and the action intention of the target person may be identified according to the audio information, which is equivalent to predicting the next action of the real customer service through the voice of the real customer service. Alternatively, the voice information may be converted into text information, and the text information is input into a pre-trained intention recognition model, so as to obtain an action intention output by the model and corresponding to the text information. The intention recognition model can be trained in advance through a plurality of sample text messages and a plurality of sample action intents. Alternatively, dialog information may be generated according to the audio information in the real video and the audio information of the user collected by the terminal device, and then the action intention may be determined based on the dialog information, where the manner of determining the action intention according to the dialog information may refer to the manner of determining the action intention according to the text information described above. Since the dialogue information can accurately reflect the action intention of the target person in the real video, the action intention is determined by the dialogue information, and the action intention can be accurately recognized.

In another embodiment, a pronunciation of a target person is recognized from a lip motion of the target person in a real image, text information is generated from the pronunciation, and the text information is input to an intention recognition model trained in advance, so that an action intention corresponding to the text information output by the model can be obtained. Therefore, the action intention of the target person in the real image can be accurately identified under the condition that the audio information cannot be well obtained.

Alternatively, the real image may be an image of a real target person captured by the terminal device through a camera, such as a photograph, a video, and the like of the target person. The virtual image and the real image at least include the face of the target person, and optionally, the virtual image and the real image may also include the body type, the gesture, the action, and the like of the target person.

Alternatively, the virtual image of the target person may be an image generated based on the character characteristics of the target person, and therefore, the features of the target person (hereinafter, referred to as avatar) displayed in the virtual image and the target person in reality, such as appearance, body type, and expression, may be very similar. The character features may include facial features, body type features, posture features, and the like.

And S120, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention.

In some embodiments, mapping relationships between a plurality of action intents and a plurality of virtual videos may be established in advance, and optionally, the plurality of action intents and the plurality of virtual videos may correspond to one another. As an example, the mapping relationship table of the plurality of action intents and the plurality of virtual videos may be as shown in table 1:

TABLE 1

Intention of action	Virtual video
		Action intention a1	Virtual video a1
Action intention a2	Virtual video a2
		Action intention a3	Virtual video a3

It can be seen that after determining the action intention, the corresponding virtual video can be queried by combining the action intention and table 1, for example, when the action intention is action intention a2, the corresponding virtual video a2 can be found from table 1, so as to obtain the virtual video corresponding to the action intention.

Optionally, table 1 may be stored locally in the terminal device, or may be stored in a cloud server in communication with the terminal device, so as to be called from the cloud server when needed.

When a virtual video corresponding to an action intention is obtained, a first frame image of the virtual video extracted from the virtual video may be used as a virtual image of a first frame of the virtual video (hereinafter, referred to as a virtual image).

And S130, if the real image is not matched with the virtual image, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of the target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image.

In some embodiments, the terminal device may compare similarity between the real image and the virtual image to determine whether the real image and the virtual image are matched, specifically, may calculate similarity between the real image and the virtual image according to a feature parameter (such as an action characteristic, an expression characteristic, a facial feature, and the like) of a target person in the real image and a corresponding feature parameter of the target person in the real image, and may determine that the real image and the virtual image are matched if the similarity exceeds a similarity threshold. If the similarity does not exceed the similarity threshold, it may be determined that the real image and the virtual image do not match.

It will be appreciated that the greater the similarity between two images, the closer the two images are.

When the real image and the virtual image do not match, a transition image may be generated based on the real image and the virtual image, and specifically, the transition image may be generated in a manner similar to image interpolation, wherein the transition image may include a plurality of interpolation images including the target person, and when the transition image includes a plurality of interpolation images, the transition image is a transition video. Alternatively, the transition image may include only one interpolated image. The number of the frame-inserted images in the transition image may be determined according to the similarity between the real image and the virtual image, when the similarity is large, the transition image may only include one frame-inserted image, and when the similarity is small, the transition image needs to include a plurality of frame-inserted images. The frame-inserted image is more similar to the virtual image than the real image because the similarity between the frame-inserted image and the real image is greater than the similarity between the real image and the virtual image.

In other embodiments, whether the real image and the virtual image match may also be determined by an optical flow method, and as an example, an optical flow vector of a specified point (which may be an edge point) of a target person in the real image and an optical flow vector of a specified point of a target person in the virtual image may be acquired by an optical flow method, and then a difference between the two optical flow vectors may be compared, and if the difference is large, it may be determined that the real image and the virtual image do not match, and if the difference is small, it may be determined that the real image and the virtual image match.

And S140, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

In some embodiments, the embodiment that can determine whether the transition image and the virtual image are matched may refer to the embodiment that whether the real image and the virtual image are matched, that is, through similarity comparison between the two images, which is not repeated herein.

If the transition image is matched with the virtual image, the display picture of the terminal equipment facing the user can be switched to the virtual video.

As can be seen, in the present embodiment, the real image of the current frame and the action intention of the target person in the real image are obtained; acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of a target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image; if the transition image is matched with the virtual image, the transition image is switched to the virtual video, so that smooth and natural transition of the real image to the virtual image is realized, namely, when a user watches the customer service video, the artificial customer service picture is smoothly switched to the virtual customer service picture, and the user does not feel the switching process, so that the user experience is improved.

Second embodiment

Referring to fig. 3, fig. 3 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and specifically may be applied to the terminal device 101 or the server 102 in the interactive system, and the method may include:

s210, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

S220, according to the action intention, a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video are obtained.

The specific implementation of S210 to S220 can refer to S110 to S120, and therefore, is not described herein.

And S230, extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

The first feature parameter and the second feature parameter may include one or more combinations of feature parameters of a target character (which may be called key points), such as motion, posture, expression, size, and angle. It will be appreciated that the angular characteristic parameter may be indicative of the angle of display of the target person in the real image or the virtual image, such as the side angle, the front angle, etc. of the target person. The size characteristic parameter may characterize the display size of the target person in the real image or the virtual image. The motion and posture characteristic parameters can represent the positions of all parts of the target character in the image of the current frame in the video.

As one mode, when the first feature parameter is extracted from the real image, when the first feature parameter is a key point, a coordinate of the key point in the real image may be extracted as the feature parameter. When the first characteristic parameter is large, the proportion of the outline of the target person to the size of the real image can be extracted as the characteristic parameter. When the first characteristic parameter is an expression, the expression parameter of the target character in the real image can be extracted and identified, the expression parameter is compared with the expression parameter which is marked in advance, for example, the expression parameters of four expressions of happiness, anger, sadness and happiness of the target character are marked in advance, after comparison, if the extracted expression parameter is matched with the expression parameter of happiness, the expression corresponding to the extracted expression parameter is determined to be happiness, and if the label corresponding to the expression parameter of happiness is 1, the characteristic parameter can be 1 when the first characteristic parameter is an expression. Alternatively, the feature parameter extraction of the motion and the gesture may refer to feature parameter extraction of the expression. Similarly, the extraction of the second feature parameter from the virtual image may refer to the way of extracting the first feature parameter from the real image.

For example, when the first characteristic parameter is an angle characteristic parameter of the target person, the second characteristic parameter is also the angle characteristic parameter of the target person. For another example, when the first characteristic parameter is a key point corresponding to an eye portion of the target person, the second characteristic parameter is also a key point corresponding to an eye portion of the target person.

S240, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining that the real image is not matched with the virtual image.

In some embodiments, a plurality of feature parameters in the first feature parameters may be used as a first vector, and a plurality of feature parameters in the second feature parameters may be used as a second vector, wherein the number and type of the feature parameters in the first vector are the same as the number and type of the feature parameters in the second vector. And then, the distance between the first characteristic parameter and the second characteristic parameter is obtained according to the first vector and the second vector, the distance can represent the similarity between the first characteristic parameter and the second characteristic parameter, and the similarity is larger when the distance is smaller.

As an example, it is assumed that the first feature parameter comprises n feature parameters, which may be referred to as frame parameters of a current frame of the virtual image, which are represented as a first vector x [1], x [2], x [3] … x [ n ]. Each feature parameter in the first vector may be used to represent a feature value of one dimension, for example, x [1] may represent coordinates of a key point of a target character in the virtual image, x [2] may represent a feature parameter of an expression of the target character in the virtual image, x [3] may represent a feature parameter of an action of the target character in the virtual image, and so on, the feature parameters of n dimensions may be obtained. Similarly, the second feature parameter may be represented as a vector y [1], y [2], y [3] … y [ n ]. For example, x [1] may be represented as the key point coordinates of the target person in the virtual image, and y [1] may be represented as the key point coordinates of the target person in the real image. The first and second feature parameters may then be substituted into a distance function f for calculating the feature parameter distance, the distance between the first and second feature parameters being calculated by f (x 1, x 2, x 3 … x n, y 1, y 2, y 3 … y n), wherein the distance output by the distance function f may be a float type value. And finally, analyzing a comparison result of the similarity and a similarity threshold according to the distance, for example, the similarity threshold corresponds to the distance threshold in advance, if the distance between the first characteristic parameter and the second characteristic parameter is smaller than the distance threshold, determining that the similarity between the first characteristic parameter and the second characteristic parameter is larger than the similarity threshold, otherwise, if the distance between the first characteristic parameter and the second characteristic parameter is larger than the distance threshold, determining that the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold.

And when the comparison result shows that the similarity of the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining that the real image is not matched with the virtual image.

In some embodiments, the first characteristic parameter includes a first characteristic point, and the second characteristic parameter includes a second characteristic point, as shown in fig. 4, S240 may specifically include the following steps:

s241, it is determined whether the distance between the first feature point and the second feature point is not less than the distance threshold.

S242, if the distance between the first feature point and the second feature point is not less than the distance threshold, determining that the similarity between the first feature parameter and the second feature parameter is less than the similarity threshold.

As an example, assume that the distance threshold is 3mm and the similarity threshold is 80. When the distance between the first characteristic point and the second characteristic point is smaller than the distance threshold, the similarity between the corresponding first characteristic parameter and the corresponding second characteristic parameter is larger than 80; when the distance between the first feature point and the second feature point is not less than the distance threshold, the similarity between the corresponding first feature parameter and the corresponding second feature parameter is less than 80. Therefore, when the distance between the first feature point and the second feature point is 2mm, and the similarity between the third feature parameter and the second feature parameter is greater than 80, it may be determined that the similarity between the first feature parameter and the second feature parameter is not less than the similarity threshold. When the distance between the first feature point and the second feature point is 4mm, it may be determined that the similarity between the first feature parameter and the second feature parameter is less than the similarity threshold value, corresponding to the similarity between the first feature parameter and the second feature parameter being less than 80.

In consideration of the fact that the feature points of the person can reflect the posture of the person more than other features of the person, the similarity between the first feature parameter and the second feature parameter can be effectively judged according to the distance between the first feature point of the first feature parameter and the second feature point of the second feature parameter.

And S250, if the real image is not matched with the virtual image, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of the target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image.

And S260, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

The specific implementation of S250 to S260 may refer to S130 to S140, and therefore is not described herein.

In the embodiment, the first characteristic parameter is extracted from the real image and the second characteristic parameter is extracted from the virtual image; if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, the real image is determined to be not matched with the virtual image, so that whether the characteristic of the target person in the virtual image is close to the characteristic of the target person in the real image or not can be accurately judged, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, the difference between the characteristic of the target person in the virtual image and the target person in the real image is large, and the real image can be determined to be not matched with the virtual image.

Third embodiment

Referring to fig. 5, fig. 5 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and specifically may be applied to the terminal device 101 or the server 102 in the interactive system, and the method may include:

s310, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

S320, according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

S330, extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

S340, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining that the real image is not matched with the virtual image.

The specific implementation of S310 to S340 can refer to S210 to S240, and therefore will not be described herein.

And S350, if the real image is not matched with the virtual image, generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the third characteristic parameter and the first characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter.

And the type and the number of the characteristic parameters in the third characteristic parameter are the same as those of the first characteristic parameter.

In some embodiments, the similarity between the first feature parameter and the second feature parameter may be represented by the distance between the first feature parameter and the second feature parameter, since the smaller the distance between the first feature parameter and the second feature parameter, the greater the similarity. Whether the similarity of the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value can be determined according to the judgment of the distance between the first characteristic parameter and the second characteristic parameter. When it is determined that the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, a third characteristic parameter can be obtained according to the first characteristic parameter and the second characteristic parameter, and the similarity between the third characteristic parameter and the second characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter.

In some embodiments, a preset number of feature parameters may be obtained from a local database of the electronic device, and then the first feature parameter is sequentially compared with the preset number of feature parameters for similarity, so as to select a target feature parameter, where the similarity between the target feature parameter and the first feature parameter is the largest among the preset number of feature parameters. Then, whether the similarity between the target characteristic parameter and the first characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter is compared, and if so, the target characteristic parameter can be used as a third characteristic parameter.

In other embodiments, when the similarity between the first feature parameter and the second feature parameter is smaller than the similarity threshold, the first feature parameter and the second feature parameter may be input to the pre-trained prediction model, and a third feature parameter output by the pre-trained prediction model may be obtained. And the similarity between the third characteristic parameter and the second characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter.

Wherein the predictive model may be a neural network model. The prediction model is used for acquiring a third characteristic parameter which is more similar to the second characteristic parameter than the first characteristic parameter according to the first characteristic parameter and the second characteristic parameter. As an example, for example, if one of the first characteristic parameters is a coordinate (3,4) of a key point of the eye portion, and the corresponding characteristic parameter of the second characteristic parameter is a coordinate (1,0) of a key point of the eye portion, the prediction module may predict that the abscissa of the target key point is between 1 and 3 and the ordinate is between 0 and 4, so as to obtain a coordinate range, extract a coordinate from the coordinate range as a coordinate of the target key point, and determine the coordinate of the target key point as the coordinate of the key point of the eye portion of the third characteristic parameter.

And S360, generating an interpolation image based on the third characteristic parameter.

In some embodiments, the third feature parameter includes features of the expression, key points, motion, posture, etc. of the target person, and thus can be used to generate a person representation of the target person. Optionally, the number of the third feature parameters may be one or more, each third feature parameter may generate one frame interpolation image, and specifically, the third feature parameters may be input to a machine learning model trained in advance, and then the frame interpolation image corresponding to the third feature parameters is acquired.

As shown in fig. 6, in some embodiments, S360 comprises:

s361, using the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, using the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculating a difference between the first similarity and the second similarity.

As an example, if the first similarity is 95 and the second similarity is 85, the difference between the first similarity and the second similarity is 10.

S362, if the difference between the first similarity and the second similarity is not less than the specified value, generating an interpolated image based on the third feature parameter.

Taking the above example as a reference, if the specified value is 5, it may be determined that the difference between the first similarity and the second similarity is not less than the specified value, and this may indicate that the obtained third feature parameter is closer to the second feature parameter than the first feature parameter, and the degree of the proximity is not too small. The interpolated image may then be generated with the third characteristic parameter such that the degree of proximity change between the target person in the interpolated image generated based on the third characteristic parameter compared to the target person in the virtual image is not too small.

As another example, if the first similarity is 95 and the second similarity is 94, the difference between the first similarity and the second similarity is 1, and if the specified value is 5, it may be determined that the difference between the first similarity and the second similarity is smaller than the specified value, which may indicate that the obtained third feature parameter is closer to the second feature parameter than the first feature parameter, but the degree of the proximity is very small, and if the interpolated image is generated based on the third feature parameter, when the real image is switched to the transition image in the video, a tendency that the target person in the interpolated image in the transition image approaches the target person in the real image may not be substantially seen, thereby causing a problem that the quality of the generated interpolated image is poor.

In this embodiment, the similarity between the third feature parameter and the first feature parameter is defined as a first similarity, the similarity between the first feature parameter and the second feature parameter is defined as a second similarity, and a difference between the first similarity and the second similarity is calculated. Because the difference between the first similarity and the second similarity is within the specified range, the change degree of the target person in the interpolated image generated based on the third characteristic parameter is neither too large nor too small, so that the situation that the change degree is too large to cause excessive unsmooth is avoided, and the situation that the change degree is too small to cause low quality of the generated interpolated image is avoided.

And S370, generating a transition image based on the frame insertion image, and switching the real image into the transition image.

In some embodiments, as shown in fig. 7, S370 may include:

s371, if the similarity between the third feature parameter and the first feature parameter is smaller than the similarity threshold, determining the interpolated image as the first frame image of the transition image, and determining whether the similarity between the third feature parameter and the second feature parameter is smaller than the similarity threshold.

In some embodiments, if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold, it indicates that the target person in the interpolated image generated based on the third characteristic parameter is very close to the target person in the real image, so that the interpolated image generated at this time may be determined as the first frame image of the transition image, and when the real image is switched to the transition image, it is equivalent to directly switching from the real image to the interpolated image, so that the user cannot perceive any change of the target person displayed during switching. In addition, because the transition image needs to be switched to the virtual image, whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value can be determined, so as to determine whether the target person in the transition image and the target person in the virtual image are close to each other, and if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value, it indicates that the difference between the target person in the transition image and the target person in the virtual image is large.

And S372, if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter.

In some embodiments, since whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value indicates that the difference between the target person in the transition image and the target person in the virtual image is large, it is further necessary to insert an additional frame insertion image after the first frame insertion image in the transition image, so that the additional frame insertion image is closer to the virtual image, and a smooth transition from the transition image to the virtual image is realized. For a specific implementation of frame interpolation between the first frame interpolated image of the transition image and the virtual image, reference may be made to S350, and an implementation of generating a third feature parameter based on the first feature parameter of the real image and the second feature parameter of the virtual image is not described herein again.

And S373, generating a target frame interpolation image based on the fourth characteristic parameter.

In S373, the specific implementation of generating the target frame interpolation image based on the fourth characteristic parameter may refer to S360, and the implementation of generating the frame interpolation image based on the third characteristic parameter is not described herein again.

As shown in fig. 8, in some embodiments, the generating of the target frame interpolation image based on the fourth feature parameter S373 may include:

s3731, the similarity between the fourth characteristic parameter and the second characteristic parameter is taken as a third similarity, the similarity between the third characteristic parameter and the second characteristic parameter is taken as a fourth similarity, and a difference between the third similarity and the fourth similarity is calculated.

S3732, if the difference between the third similarity and the fourth similarity is not less than the specified value, generating the target interpolated frame image based on the fourth feature parameter.

The specific embodiments of S3731 to 3732 refer to S361 to S362, and therefore are not described herein. From S361 to S362, it can be seen that the degree of change of the target person in the target interpolated image generated based on the fourth characteristic parameter is ensured not to be too large or too small through S3731 to S3732, thereby avoiding the situation that the degree of change is too large to cause excessive unsmooth, and the situation that the degree of change is too small to cause low quality of the generated target interpolated image.

And S374, generating a transition image based on the frame insertion image and the target frame insertion image.

As shown in fig. 9, S374, generating a transition image based on the frame interpolation image and the target frame interpolation image may include:

s3741, determine whether the similarity between the fourth characteristic parameter and the second characteristic parameter is less than the similarity threshold.

And when the similarity of the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value, indicating that the distance between the fourth characteristic parameter and the second characteristic parameter is very close.

S3742, if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame-inserted image as the last frame image of the transition image, and generating the transition image according to the frame-inserted image and the target frame-inserted image.

When the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, it is indicated that the distance between the fourth characteristic parameter and the second characteristic parameter is very close, and therefore, the target person in the target frame-inserted image is also very close to the target person in the virtual image, so that the target frame-inserted image can be determined as the last frame image of the transition image, and the transition image is generated through the frame-inserted image and the target frame-inserted image, so that the last frame image of the transition image can be smoothly transited to the virtual image, and further, the transition image can be smoothly switched to the virtual image.

It is understood that, in the present embodiment, the way of comparing the similarity between the characteristic parameters may refer to the way of comparing the distances by the corresponding characteristic points in the above embodiments.

In this embodiment, if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold, the interpolated image is determined as the first frame image of the transition image, and it is determined whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold; if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter; generating a target frame interpolation image based on the fourth characteristic parameter; a transition image is generated based on the interpolated frame image and the target interpolated frame image. Therefore, the generated transition image can be ensured to be smoothly transited with not only the real image but also the virtual image.

And S380, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

The specific implementation of S380 may refer to S260, and therefore is not described herein.

In this embodiment, if the real image and the virtual image do not match, a third feature parameter is generated based on the first feature parameter of the real image and the second feature parameter of the virtual image, an interpolated image is generated based on the third feature parameter, a transition image is generated based on the interpolated image, and the real image is switched to the transition image, so that smooth transition between the real image and the transition image can be ensured.

Fourth embodiment

Referring to fig. 10, fig. 10 is a schematic flow chart illustrating an avatar switching method according to an embodiment of the present application. The method may be applied to the interactive system 100 provided in the first embodiment, and specifically may be applied to the terminal device 101 or the server 102 in the interactive system, and the method may include:

referring to fig. 10, fig. 10 is a schematic flow chart illustrating an avatar switching method according to an embodiment of the present application. The method can comprise the following steps:

and S410, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

And S420, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention.

S430, extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

S440, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining that the real image is not matched with the virtual image.

S450, if the real image is not matched with the virtual image, generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the third characteristic parameter and the first characteristic parameter is larger than the similarity between the first characteristic parameter and the second characteristic parameter.

The specific implementation of S410 to S450 can refer to S310 to S350, and therefore is not described herein.

And S460, inputting the third characteristic parameter into a pre-trained virtual image model, and acquiring an interpolation image corresponding to the third characteristic parameter.

As an example, for example, when the third feature parameter may be a face feature point of the target person, and the face feature point is input into the pre-trained avatar model, an interpolated image corresponding to the face feature point may be obtained, where the interpolated image includes the face of the target person. For another example, when the third characteristic parameter may be a characteristic skeletal characteristic point of the target person, and the skeletal characteristic point is input into the virtual image model trained in advance, an interpolated image corresponding to the skeletal characteristic point may be obtained, where the interpolated image includes the posture of the target person.

In some embodiments, at S460, the third feature parameter is input into the pre-trained avatar model, and the pre-trained avatar model may be pre-trained before the frame-inserted image corresponding to the third feature parameter is acquired, where an embodiment of pre-training the avatar model may include:

first, a sample image of a target person is acquired.

In some embodiments, a sample image of the target person may be captured by the camera, wherein the sample image may include a picture, a video, and the like. When the sample image of the target person is stored in the local or cloud terminal of the terminal device, the terminal device can also directly extract the sample image from the local or cloud terminal of the terminal device.

And secondly, extracting sample characteristic parameters and sample frame insertion images of the target person from the sample images.

In some embodiments, sample feature parameters of the target person may be extracted from the sample image by a face recognition technique, a motion recognition technique, an expression recognition technique, or the like, and then an image including the target person in the sample image is used as a sample frame-inserted image.

And finally, inputting the sample characteristic parameters and the sample frame insertion images into a machine learning model for training to obtain a pre-trained virtual image model.

Optionally, in practical applications, after the avatar model is trained, a real video including the target person may be recorded in the field, for example, a real video of a real person customer service for about one minute is recorded, and then the avatar model is fine-tuned (finetune) through the real video, so as to improve the quality of the image generated by the avatar model.

In some embodiments, the machine learning model may select a GAN (generic adaptive Networks, Generative Networks) model, which can continuously optimize its output through mutual game learning of a Generator (generatior) and a Discriminator (Discriminator), and in the case that the number of training samples is large enough, a face image approaching a human face of a real person infinitely can be obtained through the GAN model, thereby achieving the effect of "false and false". Further, the face image may be a two-dimensional face image, that is, the face feature points are input into the GAN model, and a two-dimensional face image corresponding to the face feature points may be obtained.

Optionally, the process of training the avatar model in advance may be performed before any step before S440, for example, before S460, before S450, before S410, and without limitation.

And S470, generating a transition image based on the frame insertion image, and switching the real image into the transition image.

And S480, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

In the embodiment, the virtual image is generated by extracting the characteristic parameters of the target person, so that the virtual image and the appearance of the target person can be highly similar, the user can not easily perceive the switching between the virtual image and the real target person, the switching is more natural, and the user experience is improved. And the third characteristic parameter is input into the pre-trained virtual image model to obtain the frame interpolation image corresponding to the third characteristic parameter, so that the generation efficiency of the frame interpolation image can be improved, and the picture displayed by the terminal equipment can be smoothly played when the frame interpolation image and the real image are switched.

Fifth embodiment

Referring to fig. 11, fig. 11 is a flowchart illustrating an avatar switching method according to an embodiment of the present application. The method may be applied to the interactive system 100 provided in the first embodiment, and specifically may be applied to the terminal device 101 or the server 102 in the interactive system, and the method may include:

and S510, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

The specific implementation of S510 may refer to S110, and therefore is not described herein.

S520, whether a virtual video corresponding to the action intention exists is determined.

The terminal device can detect whether a virtual video corresponding to the action intention exists in the virtual video library. Wherein, a plurality of action intentions and virtual videos corresponding to the action intentions can be stored in the virtual video library in advance. Specifically, the relationship between the action intent and the virtual video may be as shown in table 1. Optionally, the virtual video library may be stored locally in the terminal device, or may be stored in the server.

S530, if the virtual video exists, the virtual video corresponding to the action intention is obtained.

If the virtual video library is detected to be the virtual video corresponding to the action intention, the virtual video corresponding to the action intention can be acquired from the virtual video library.

And S540, if the answer template does not exist, acquiring the answer template corresponding to the action intention.

If the virtual video inventory is detected in the virtual video corresponding to the action intention, an answer template corresponding to the action intention can be obtained from an answer template library, wherein the answer template is a conversation template used for conversation with the client, and the answer template comprises text information which is required to be answered by the client service in the conversation with the client. Different response templates can be associated with different action intentions in advance, for example, when the action intention is that the customer service wants to end the conversation, the corresponding response template can include text information such as "this service is ended", "welcome you to use next time", and the like. The reply template library may be stored locally in the terminal device, or may be stored in the end server, which is not limited herein.

And S550, extracting characteristic parameters of the target person from the real image, generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template, and acquiring a virtual image of a first frame of the virtual video.

In some embodiments, an avatar generation model may be trained in advance, the avatar generation model may be generated from sample feature parameters of a target person, sample images, and sample text information, different sample texts after training correspond to different sample feature parameters, the sample images may be trained to obtain a basic face image, a basic motion image, a basic pose image, and the like of the target person, the feature parameters and an answer template are input into the avatar generation model, can obtain the characteristic parameters corresponding to the answer template, and then update the basic human face image, the basic action image, the basic posture image and the like of the target character through the characteristic parameters corresponding to the answer template so as to obtain the virtual video corresponding to the answer template, since the action intention is obtained from the answer template, a virtual video corresponding to the action intention can be obtained. And then intercepting the first frame image from the virtual video as a virtual image of the first frame of the virtual video.

And S560, if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of the target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image.

And S570, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

The specific implementation of S560 to S570 can refer to S130 to S140, and therefore will not be described herein.

In consideration that a virtual image corresponding to an action intention may not be set in advance, in the present embodiment, by determining whether or not a virtual video corresponding to an action intention exists; if yes, acquiring a virtual video corresponding to the action intention; if not, acquiring an answer template corresponding to the action intention; the feature parameters of the target person are extracted from the real image, the virtual video corresponding to the action intention is generated based on the feature parameters and the answer template, and the virtual image of the first frame of the virtual video is acquired, so that the virtual image corresponding to the action intention can be generated on site when the virtual image corresponding to the action intention is not available, and the virtual image corresponding to the action intention can be acquired.

Sixth embodiment

Referring to fig. 12, fig. 12 is a flowchart illustrating an avatar switching method according to an embodiment of the present application. The method may be applied to the interactive system 100 provided in the first embodiment, and specifically may be applied to the terminal device 101 or the server 102 in the interactive system, and the method may include:

s610, when the real image including the target person is displayed, it is determined whether the real image satisfies the switching condition.

In some embodiments, the switching condition may be a condition for determining whether to switch the real image to the virtual image. As an example, when the real person customer service finishes the explanation of the customer and needs to switch to the virtual robot customer service to talk to the customer, the terminal device recognizes that the explanation of the real person customer service is finished, and then determines that the switching condition is satisfied. For example, if the real person customer service presses the switching key at the customer service end, it can be determined that the switching condition is satisfied. For another example, if the real-person customer service makes a gesture for switching, and the terminal device recognizes the gesture, it may be determined that the switching condition is satisfied. For example, if the real person serves a sentence for switching, and the terminal device recognizes the sentence, it may be determined that the switching condition is satisfied.

S620, if the real image meets the switching condition, the real image of the current frame and the action intention of the target person in the real image are obtained.

S630, according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

And S640, if the real image is not matched with the virtual image, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of the target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image.

And S650, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

The detailed implementation of S620 to S650 can refer to S110 to S140, and therefore is not described herein.

In the present embodiment, by determining whether or not the real image satisfies the switching condition when the real image including the target person is displayed; and if the real image meets the switching condition, acquiring the displayed real image of the current frame and the action intention of the target person in the real image, so that the real image can be switched at a proper time.

Seventh embodiment

Referring to fig. 13, fig. 13 is a flowchart illustrating an avatar switching method according to an embodiment of the present application. The method may be applied to the interactive system 100 provided in the first embodiment, and specifically may be applied to the terminal device 101 or the server 102 in the interactive system, and the method may include:

s710, when the real image including the target person is displayed, determining whether a switching instruction is received.

In some embodiments, the switching instruction may be an instruction for instructing switching of a real image displayed by the terminal device into a virtual image. The terminal equipment can detect whether a switching instruction is received or not in real time. Alternatively, the switching instruction may be generated by a real person customer service performing a specified touch operation at the customer service end, may be automatically generated based on some specific information, (e.g., time information, statement information, etc.), or may be generated based on some statement information and action information of the customer collected on site, which is not limited herein.

S720, if a switching instruction is received, the fact that the displayed real image including the target person meets the switching condition is determined.

As an example, the switching instruction is sentence information or gesture information having an intention to end a conversation, which is spoken by a live person customer service or a client, and when the sentence information or gesture information is received (or recognized) by the terminal device, it may be determined that playing the real image including the target person satisfies the switching condition.

And S730, if the real image meets the switching condition, acquiring the real image of the displayed current frame and the action intention of the target person in the real image.

And S740, acquiring the displayed real image of the current frame and the action intention of the target person in the real image.

S750, according to the action intention, a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video are obtained.

And S760, if the real image is not matched with the virtual image, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of the target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image.

And S770, if the transition image is matched with the virtual image, switching the transition image into a virtual video.

The specific implementation of S740 to S770 may refer to S110 to S140, and therefore, will not be described herein.

In the present embodiment, by determining whether a switching instruction is received when a real image including a target person is displayed; if a switching instruction is received, determining that the displayed real image including the target person meets a switching condition; and if the real image meets the switching condition, acquiring the displayed real image of the current frame and the action intention of the target person in the real image, so that the real image and the virtual image can be switched accurately and flexibly.

Eighth embodiment

Referring to fig. 14, fig. 14 illustrates an avatar switching apparatus provided in an embodiment of the present application, where the avatar switching apparatus 800 includes: a first obtaining module 810, a second obtaining module 820, a first switching module 830, and a second switching module 840. Wherein: the first obtaining module 810 is configured to obtain a real image of a displayed current frame and an action intention of a target person in the real image; the second obtaining module 820 is configured to obtain a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; the first switching module 830 is configured to generate a transition image based on the real image and the virtual image if the real image and the virtual image are not matched, and switch the real image into the transition image, where the transition image includes an interpolated image of the target person, and a similarity between the interpolated image and the real image is greater than a similarity between the real image and the virtual image; the second switching module 840 is configured to switch the transition image into the virtual video if the transition image is matched with the virtual image.

Optionally, the avatar switching apparatus 800 further includes:

Optionally, the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, and the first feature point and the second feature point are the same feature point of the target person, and the avatar switching apparatus 800 further includes:

Optionally, the first switching module 830 comprises:

Optionally, the frame interpolation image generation submodule is specifically configured to use a similarity between the third feature parameter and the first feature parameter as a first similarity, use a similarity between the first feature parameter and the second feature parameter as a second similarity, and calculate a difference between the first similarity and the second similarity; and if the difference value between the first similarity and the second similarity is not less than the specified value, generating the frame interpolation image based on the third characteristic parameter.

Optionally, the first switching module 830 further includes:

Optionally, the target frame interpolation image generation submodule is specifically configured to use a similarity between the fourth characteristic parameter and the second characteristic parameter as a third similarity, use a similarity between the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculate a difference between the third similarity and the fourth similarity; and if the difference value between the third similarity and the fourth similarity is not less than the specified value, generating the target frame interpolation image based on the fourth characteristic parameter.

Optionally, the transition image generation sub-module is specifically configured to determine whether the similarity between the fourth feature parameter and the second feature parameter is smaller than a similarity threshold; and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame interpolation image as the last frame image of the transition image, and generating the transition image through the frame interpolation image and the target frame interpolation image.

Optionally, the frame interpolation image generation sub-module is specifically configured to input the third feature parameter into a pre-trained avatar model, and acquire a frame interpolation image corresponding to the third feature parameter.

Optionally, the frame interpolation image generation sub-module is specifically configured to input the third feature parameter into a pre-trained avatar model, and acquire a frame interpolation image corresponding to the third feature parameter, and includes: acquiring a currently recorded real video comprising a real image, and finely adjusting a pre-trained virtual image model through the real video; and inputting the third characteristic parameter into the finely adjusted virtual image model, and acquiring an interpolation image corresponding to the third characteristic parameter.

Optionally, the matching determination module is further configured to determine whether the real image and the virtual image match by an optical flow method.

Optionally, the avatar switching apparatus 800 further includes:

Optionally, the handover detection module is specifically configured to determine whether a handover instruction is received; and if a switching instruction is received, determining that the played real image including the target person meets the switching condition.

The avatar switching apparatus 800 provided in this embodiment of the application is used to implement the corresponding avatar switching method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.

As will be clearly understood by those skilled in the art, the avatar switching apparatus provided in the embodiment of the present application can implement each process in the foregoing method embodiment, and for convenience and brevity of description, the specific working processes of the above-described apparatus and modules may refer to the corresponding processes in the foregoing method embodiment, and are not described herein again.

In the embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, each functional module in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Ninth embodiment

Referring to fig. 15, a block diagram of an electronic device 900 according to an embodiment of the present disclosure is shown. The electronic device 900 may be a smart phone, a tablet computer, or other electronic device capable of running an application. The electronic device 900 in the present application may include one or more of the following components: a processor 910, a memory 920, and one or more applications, wherein the one or more applications may be stored in the memory 920 and configured to be executed by the one or more processors 910, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 910 may include one or more processing cores. The processor 910 interfaces with various components throughout the electronic device 900 using various interfaces and circuitry to perform various functions of the electronic device 900 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 920 and invoking data stored in the memory 920. Alternatively, the processor 910 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 910 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 910, but may be implemented by a communication chip.

The Memory 920 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 920 may be used to store instructions, programs, code sets, or instruction sets. The memory 920 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created during use by the electronic device 900 (e.g., phone books, audio-visual data, chat log data), and so forth. The electronic device may be the terminal device in the above embodiment, or may be the server in the above embodiment.

Tenth embodiment

Referring to fig. 16, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 1000 has stored therein program code that can be invoked by a processor to perform the methods described in the above-described method embodiments.

The computer-readable storage medium 1000 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 1000 includes a non-volatile computer-readable storage medium. The computer readable storage medium 1000 has storage space for program code 1010 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 1010 may be compressed, for example, in a suitable form.

To sum up, the avatar switching method, apparatus, electronic device and storage medium provided in the embodiments of the present application are an avatar switching system composed of a camera array, a mobile platform and a controller, wherein the camera array is arranged around a shooting area, and each camera in the camera array is arranged according to a preset lens viewing angle; the mobile platform is arranged in the shooting area in a sliding mode and used for bearing the shooting object to move in the shooting area; the controller respectively with camera array and moving platform electric connection for control moving platform removes to the assigned position and controls the assigned camera in the camera array and shoots, with the image data of collection shooting object at the appointed shooting angle as digital people's data, thereby can be swiftly, shoot the model from a plurality of shooting angles automatically, with the collection digital people data, effectively improved the collection efficiency of digital people's data.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An avatar switching method, comprising:

acquiring a real image of a displayed current frame and an action intention of a target person in the real image;

according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video;

if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an interpolation image of the target person, and the similarity between the interpolation image and the real image is greater than that between the real image and the virtual image;

and if the transition image is matched with the virtual image, switching the transition image into the virtual video.

2. The method according to claim 1, further comprising, before the generating a transition image based on the real image and the virtual image and switching the real image to the transition image if the real image and the virtual image do not match, the step of:

extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person;

and if the similarity of the first characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value, determining that the real image and the virtual image are not matched.

3. The method according to claim 2, wherein the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, the first feature point and the second feature point are the same feature point of the target person, and before the determining that the real image and the virtual image do not match if the similarity between the first feature parameter and the second feature parameter is smaller than a similarity threshold value, the method further includes:

determining whether a distance between the first feature point and the second feature point is not less than a distance threshold;

if the distance between the first feature point and the second feature point is not smaller than the distance threshold, determining that the similarity between the first feature parameter and the second feature parameter is smaller than the similarity threshold.

4. The method of claim 2, wherein the generating a transition image based on the real image and the virtual image and switching the real image to the transition image comprises:

generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the third characteristic parameter and the first characteristic parameter is greater than the similarity between the first characteristic parameter and the second characteristic parameter;

generating the frame interpolation image based on the third characteristic parameter;

and generating the transition image based on the frame insertion image, and switching the real image into the transition image.

5. The method of claim 4, wherein the generating the interpolated image based on the third feature parameter comprises:

taking the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, taking the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculating a difference value between the first similarity and the second similarity;

and if the difference value between the first similarity and the second similarity is not less than a specified value, generating the frame interpolation image based on the third characteristic parameter.

6. The method of claim 4, wherein generating the transition image based on the interpolated image comprises:

if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold, determining the frame interpolation image as a first frame image of the transition image, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold;

if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter;

generating a target frame interpolation image based on the fourth characteristic parameter;

generating the transition image based on the frame interpolation image and the target frame interpolation image.

7. The method of claim 6, wherein generating a target interpolated image based on the fourth feature parameter comprises:

taking the similarity between the fourth characteristic parameter and the second characteristic parameter as a third similarity, taking the similarity between the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculating a difference value between the third similarity and the fourth similarity;

and if the difference value between the third similarity and the fourth similarity is not less than a specified value, generating the target frame interpolation image based on the fourth characteristic parameter.

8. The method of claim 7, wherein generating the transition image based on the interpolated image and the target interpolated image comprises:

determining whether the similarity of the fourth characteristic parameter and the second characteristic parameter is less than the similarity threshold;

if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame interpolation image as the last frame image of the transition image, and generating the transition image through the frame interpolation image and the target frame interpolation image.

9. The method of claim 4, wherein the generating the interpolated image based on the third feature parameter comprises:

and inputting the third characteristic parameter into a pre-trained virtual image model, and acquiring the frame interpolation image corresponding to the third characteristic parameter.

10. The method according to claim 9, wherein the inputting the third feature parameter into a pre-trained avatar model, and obtaining the frame-inserted image corresponding to the third feature parameter comprises:

acquiring a currently recorded real video comprising the real image, and finely adjusting a pre-trained virtual image model through the real video;

and inputting the third characteristic parameter into the finely adjusted virtual image model, and acquiring the frame interpolation image corresponding to the third characteristic parameter.

11. The method of claim 9, further comprising, prior to said generating the interpolated image based on the third feature parameter:

acquiring a sample image of the target person;

extracting sample characteristic parameters and sample frame insertion images of the target person from the sample images;

and inputting the sample characteristic parameters and the sample frame insertion images into a machine learning model for training to obtain a pre-trained virtual image model.

12. The method according to any one of claims 1 to 11, further comprising, before the generating a transition image based on the real image and the virtual image and switching the real image to the transition image if the real image and the virtual image do not match, the step of:

determining whether the real image and the virtual image match by an optical flow method.

13. The method according to any one of claims 1 to 11, further comprising, before the acquiring the virtual video corresponding to the action intention and the virtual image of the first frame of the virtual video:

determining whether a virtual video corresponding to the action intention exists;

if yes, executing the virtual video corresponding to the action intention;

if not, acquiring an answer template corresponding to the action intention;

extracting feature parameters of the target person from the real image, and generating a virtual video corresponding to the action intention based on the feature parameters and the answer template.

14. The method according to any one of claims 1 to 11, wherein before the acquiring the displayed real image of the current frame and the action intention of the target person in the real image, the method further comprises:

determining whether a real image including a target person satisfies a switching condition when the real image is displayed;

and if the real image meets the switching condition, executing the real image of the current frame and the action intention of the target person in the real image.

15. The method of claim 14, wherein the determining whether the real image satisfies a switching condition comprises:

determining whether a handover instruction is received;

and if a switching instruction is received, determining that the played real image including the target person meets the switching condition.

16. An avatar switching apparatus, comprising:

the first acquisition module is used for acquiring a displayed real image of a current frame and an action intention of a target person in the real image;

the second acquisition module is used for acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention;

a first switching module, configured to generate a transition image based on the real image and the virtual image and switch the real image to the transition image if the real image and the virtual image are not matched, where the transition image includes an interpolated image of the target person, and a similarity between the interpolated image and the real image is greater than a similarity between the real image and the virtual image;

and the second switching module is used for switching the transition image into the virtual video if the transition image is matched with the virtual image.

17. An electronic device, comprising:

a memory;

one or more processors coupled with the memory;

one or more programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-15.

18. A computer-readable storage medium having program code stored therein, the program code being invoked by a processor to perform the method of any one of claims 1 to 15.