CN111260756B

CN111260756B - Method and device for transmitting information

Info

Publication number: CN111260756B
Application number: CN201811459739.3A
Authority: CN
Inventors: 朱祥祥
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2023-09-26
Anticipated expiration: 2038-11-30
Also published as: CN111260756A

Abstract

The embodiment of the application discloses a method and a device for sending information. One embodiment of the method comprises the following steps: receiving an original image containing a first face image and information to be processed consisting of at least two static images; for a still image of the at least two still images, performing the following operations: in response to determining that the still image contains a face image, taking the face image contained in the still image as a second face image; processing the first face image based on the second face image; replacing a second face image in the static image with the processed first face image; and in response to determining that each piece of static image containing the face image in the information to be processed is replaced, sending the replaced information to be processed. The embodiment realizes that the second face image contained in the information to be processed is replaced based on the first face image contained in the original image.

Description

Method and device for transmitting information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for sending information.

Background

In the process of continuous development of internet technology, various information resources are gradually enriched. At present, a user can acquire various pictures and videos through the internet. In practice, certain information displayed in pictures and videos can be replaced according to actual needs. Taking a figure picture as an example, a specific part picture can be used for directly replacing specific parts (such as a head, a face, a body and the like) of the figure in the picture according to actual needs, and a better replacement effect can be obtained in a static image. Because each part of the person in the moving picture and the video is dynamically changed, the direct replacement of the moving picture or the video with a specific part of the moving picture or the video is often lack of flexibility.

Disclosure of Invention

The embodiment of the application provides a method and a device for sending information.

In a first aspect, an embodiment of the present application provides a method for transmitting information, including: receiving an original image containing a first face image and information to be processed consisting of at least two static images; for a still image of the at least two still images, performing the following operations: in response to determining that the still image contains a face image, taking the face image contained in the still image as a second face image; processing the first face image based on the second face image; replacing a second face image in the static image with the processed first face image; and in response to determining that the replacement of each static image containing the face image in the information to be processed is completed, sending the replaced information to be processed.

In some embodiments, the processing the first face image based on the second face image includes: respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image; and adjusting the key point information of the first face image according to the key point information of the second face image.

In some embodiments, the processing the first face image based on the second face image includes: respectively inputting the first face image and the second face image into a pre-established expression recognition model to obtain expression categories of the first face image and the second face image, wherein the expression recognition model is used for representing the corresponding relation between the face image and the expression categories; and responding to the fact that the expression categories of the first face image and the second face image are matched, and taking the first face image as a processed first face image.

In some embodiments, the processing the first face image based on the second face image further includes: and in response to determining that the expression categories of the first face image and the second face image are not matched, inputting the expression categories of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as a processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression categories and the generated face image.

In some embodiments, the expression recognition model is trained by: acquiring a first training sample set, wherein the first training sample comprises a face image and expression categories corresponding to the face image; and taking the face image of the first training sample in the first training sample set as input, taking the expression category corresponding to the input face image as expected output, and training to obtain the expression recognition model.

In some embodiments, the image generation model described above is trained by: acquiring a second training sample set, wherein the second training sample comprises a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category, wherein the sample face image and the sample generation face image are face images of the same person, and the expression category of the sample generation face image is matched with the sample expression category; and taking a sample face image and a sample expression category of a second training sample in the second training sample set as input, taking a sample generation face image corresponding to the input sample face image and sample expression category as expected output, and training to obtain the image generation model.

In a second aspect, an embodiment of the present application provides an apparatus for transmitting information, including: a receiving unit configured to receive an original image including a first face image and information to be processed composed of at least two still images; an execution unit configured to execute a predetermined operation for a still image of the at least two still images, wherein the execution unit includes: a determination unit configured to, in response to determining that the face image is included in the still image, take the face image included in the still image as a second face image; a processing unit configured to process the first face image based on the second face image; a replacing unit configured to replace a second face image in the still image with the processed first face image; and a transmitting unit configured to transmit the replaced information to be processed in response to determining that replacement of each still image including the face image in the information to be processed is completed.

In some embodiments, the processing unit is further configured to: respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image; and adjusting the key point information of the first face image according to the key point information of the second face image.

In some embodiments, the processing unit is further configured to: respectively inputting the first face image and the second face image into a pre-established expression recognition model to obtain expression categories of the first face image and the second face image, wherein the expression recognition model is used for representing the corresponding relation between the face image and the expression categories; and responding to the fact that the expression categories of the first face image and the second face image are matched, and taking the first face image as a processed first face image.

In some embodiments, the processing unit is further configured to: and in response to determining that the expression categories of the first face image and the second face image are not matched, inputting the expression categories of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as a processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression categories and the generated face image.

In a third aspect, an embodiment of the present application provides an apparatus, including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

The method and the device for sending information provided by the embodiment of the application firstly receive the original image containing the first face image and the information to be processed composed of at least two static images, and then execute the following operations on the static images in the at least two static images: and finally, in response to determining that each piece of still images containing the face image in the information to be processed is completed, sending the replaced image to be processed, thereby realizing the replacement of the second face image contained in the information to be processed based on the first face image contained in the original image, and because the first face image is processed according to the second face image in each still image contained in the information to be processed during replacement, the replaced information to be processed can be vivid and natural.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for transmitting information in accordance with the present application;

fig. 3 is a schematic view of an application scenario of a method for transmitting information according to the present application;

FIG. 4 is a flow chart of yet another embodiment of a method for transmitting information in accordance with the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for transmitting information in accordance with the present application;

FIG. 6 is a schematic diagram of a computer system suitable for use with an apparatus implementing an embodiment of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which a method for transmitting information or an apparatus for transmitting information of an embodiment of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as an image processing class application, a video editing class application, a web browser application, a search class application, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image processing, including but not limited to smartphones, tablet computers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present application is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for information displayed on the terminal devices 101, 102, 103. The background server may perform processing such as analysis on the received information such as the image, and feed back the processing result (e.g., the processed information) to the terminal devices 101, 102, 103.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present application is not particularly limited herein.

It should be noted that, the method for sending information provided by the embodiment of the present application may be performed by the terminal devices 101, 102, 103, or may be performed by the server 105. Accordingly, the means for transmitting information may be provided in the terminal devices 101, 102, 103 or in the server 105. The application is not limited in this regard.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for transmitting information in accordance with the present application is shown. The method for transmitting information includes the steps of:

step 201, receiving an original image containing a first face image and information to be processed composed of at least two still images.

In the present embodiment, an execution subject of a method for transmitting information (e.g., the terminal devices 101, 102, 103 or the server 105 shown in fig. 1) can receive an original image and information to be processed in various ways. For example, when the execution subject is a terminal device, the original image and the information to be processed input by the user may be directly received. For another example, when the execution subject is a server, the original image and the information to be processed may be received from a terminal device with which the user inputs information. Here, the above-described original image may contain a first face image, and the original image may be, as an example, a whole body shot, a half body shot, or the like of one person. The information to be processed may be composed of at least two still images, and as one example, the information to be processed may be a moving picture such as a GIF (Graphics Interchange Format, image interchange format) image. A moving picture refers to a picture that generates a certain dynamic effect when a specific set of still images is switched at a specific frequency. As another example, the information to be processed may also be a piece of video.

In practice, after the executing body receives the original image, face detection may be performed on the original image, so as to obtain the first face image. It should be noted that the face detection technology is a well-known technology widely studied and applied at present, and will not be described herein.

Step 202, a predetermined operation is performed on a still image of the at least two still images.

In the present embodiment, the above-described execution subject may execute a predetermined operation for each of at least two still images constituting the information to be processed. Wherein, the predetermined operation may include the steps of:

in response to determining that the still image includes a face image, step 2021 takes the face image included in the still image as the second face image.

In this embodiment, the executing body may perform face detection on the still image, and determine whether the still image includes a face image according to a face detection result. And in response to determining that the still image contains a face image, taking the face image contained in the still image as a second face image.

Step 2022 processes the first face image based on the second face image.

In this embodiment, the execution body may perform various processes on the first face image based on the second face image. As an example, the executing body may adjust the information of the angle of the face, the state of the eyes, the state of the mouth, the illumination, and the like in the first face image according to the information of the angle of the face (e.g., front face, side face, back face, low head, etc.), the state of the eyes (e.g., open eyes, squint eyes, closed eyes, etc.), the state of the mouth (e.g., open mouth, closed mouth, etc.), the illumination, and the like in the second face image. Taking the angle of the face as an example, assuming that the angle of the face in the second face image is a side face and the angle of the face in the first face image is a positive face, the face pose of the first face image can be corrected based on affine transformation, positive-to-negative transformation and other modes, so that the angle of the face in the first face image is adjusted to be the side face. Incidentally, face posture correction based on affine transformation, orthomysterious transformation, and the like is a well-known technique widely studied and applied at present, and will not be described here.

In some optional implementations of this embodiment, the step 2022 may specifically include the following:

in step S1, the executing body may input the first face image and the second face image into a pre-established expression recognition model, respectively, to obtain expression categories of the first face image and the second face image.

In this implementation, the expression category may be used to characterize the category of expression of the face presented by the face image. As an example, facial expressions may be classified into different categories in advance according to actual needs, for example, emotions expressed by facial expressions are classified into blankness, happiness, surprise, fear, gas, and the like. It can be understood that the more the facial expressions are classified, the better the effect of the information to be processed obtained after the facial images are replaced.

Here, the expression recognition model may be used to characterize the correspondence between the facial image and the expression class. As an example, the expression recognition model described above may include a feature extraction section and a first correspondence table. Wherein the feature extraction section may be used to extract feature information of the face image. The first correspondence table may be a correspondence table formulated by a technician based on statistics of a large amount of feature information and expression categories, and storing correspondence between a plurality of feature information and expression categories. Thus, for a certain face image, the expression recognition model may first extract feature information of the face image using the feature extraction unit, and take the obtained feature information as target feature information. And comparing the target feature information with the feature information in the first corresponding relation table, and taking the expression category corresponding to the piece of feature information in the first corresponding relation table as the expression category of the face image if the target feature information is the same as or similar to the piece of feature information in the first corresponding relation table.

In some alternative implementations, the expression recognition model may be trained by: first, a first training sample set is obtained, wherein the first training sample may include a face image and an expression category corresponding to the face image. Then, taking a face image of a first training sample in the first training sample set as input, taking an expression category corresponding to the input face image as expected output, and training to obtain an expression recognition model.

In this implementation, the execution subject of the training expression recognition model may be the same as or different from the above subject. As an example, the execution subject that trains the expression recognition model may first determine the first initial model and model parameters of the first initial model. Here, the first initial model may be used to characterize the correspondence between the face image and the expression class, and the first initial model may be various machine learning models such as a convolutional neural network, a deep neural network, and the like. Then, the face image in the first training sample set can be input into the first initial model to obtain the expression category of the face image, the expression category corresponding to the face image is used as the expected output of the first initial model, and the first initial model is trained by using a machine learning method. Specifically, the difference between the obtained expression category and the desired output may be calculated first using a preset loss function. Then, based on the calculated difference, the model parameters of the first initial model can be adjusted, and training is ended under the condition that the preset training ending condition is met, so that the expression recognition model is obtained. For example, the training end conditions preset herein may include, but are not limited to, at least one of: the training time exceeds a preset duration, the training times exceeds a preset number, the prediction accuracy of the first initial model is greater than a preset accuracy threshold, and the like.

Here, various implementations may be employed to adjust model parameters of the first initial model based on differences between the generated expression categories and the desired output. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent, random gradient descent) algorithm may be employed to adjust the model parameters of the first initial model.

And step S2, responding to the fact that the expression categories of the first face image and the second face image are matched, and taking the first face image as the processed first face image.

Here, the execution subject may determine whether the expression categories of the first face image and the second face image match. In response to determining that the expression categories of the first face image and the second face image match (e.g., are the same), the first face image may be treated as a processed first face image. That is, when the expression categories of the first face image and the second face image are matched, the first face image is directly used as the processed first face image without adjusting the face information in the first face image.

In some alternative implementations, step 2022 above may further include the following:

And step S3, in response to determining that the expression categories of the first face image and the second face image are not matched, inputting the expression categories of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as the processed first face image.

Here, the image generation model may be used for a correspondence of input information, which may include a face image and an expression category, with generating the face image. As an example, the above-described image generation model may include a feature extraction section and a second correspondence table. The feature extraction section may be used to extract feature information in the face image. The second correspondence table may be a correspondence table formulated by a technician based on statistics of a large amount of input information and generated images, and storing correspondence between a plurality of input information and generated images, wherein the input information includes feature information and expression categories of face images. Thus, for a certain face image and expression category that are input, the image generation model may first extract feature information of the face image by using the feature extraction unit, obtain input information, and use the obtained input information as target input information. And comparing the target input information with the input information in the second corresponding relation table, and taking the generated image corresponding to the input information in the second corresponding relation table as the generated image of the input information if the target input information is the same as or similar to a certain piece of input information in the second corresponding relation table.

Alternatively, the image generation model may be trained by: first, a second training sample set is obtained, wherein the second training sample may include a sample face image and a sample expression class, and a sample generated face image corresponding to the sample face image and the sample expression class. The sample face image and the sample generating face image are face images of the same person, and the expression category of the sample generating face image is matched (e.g., identical) with the expression category of the sample. And then taking a sample face image and a sample expression category of a second training sample in the second training sample set as input, taking a sample generation face image corresponding to the input sample face image and sample expression category as expected output, and training to obtain an image generation model.

As an example, the execution subject of the training image generation model may first determine the second initial model and model parameters of the second initial model. Here, the second initial model may be used to characterize a correspondence of input information to the generated face image, wherein the input information may include a sample face image and a sample expression class. The second initial model may be a convolutional neural network, a deep neural network, or the like, various machine learning models. And then, inputting the sample face image and the sample expression category in the second training sample set into a second initial model to obtain a generated face image, taking the sample generated face image corresponding to the sample face image and the sample expression category as expected output of the second initial model, and training the second initial model by using a machine learning method. Specifically, the difference between the resulting generated face image and the desired output may be calculated first using a preset loss function. Then, model parameters of the second initial model can be adjusted based on the calculated difference, and training is ended under the condition that a preset training ending condition is met, so that an image generation model is obtained. For example, the training end conditions preset herein may include, but are not limited to, at least one of: the training time exceeds the preset duration, the training times exceeds the preset times, the generation accuracy of the second initial model is larger than a preset accuracy threshold value, and the like.

Step 2023 replaces the second face image in the still image with the processed first face image.

In this embodiment, the execution subject may replace the second face image in the still image with the first face image processed in step 2022. It will be appreciated that the size of the processed first face image, etc. needs to be adjusted before the second face image in the still image is replaced with the processed first face image. After replacement, seamless fusion, sharpening and other treatments can be carried out on the replaced image, so that the fusion effect between the edge of the face image and the background image is ensured.

And step 203, in response to determining that replacement of each static image in the information to be processed is completed, sending the replaced information to be processed.

In this embodiment, the execution body may determine whether replacement of each still image including the face image in the information to be processed is completed. In response to determining that replacement of each still image including the face image in the information to be processed is completed, the above execution may transmit the information to be processed after the replacement is completed. As an example, when the execution body is a terminal device, the information to be processed after the completion of the replacement may be sent to a display device for display. When the execution subject is a server, the replaced information to be processed can be sent to a terminal device by which the user sends the original image and the information to be processed.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for transmitting information according to the present embodiment. In the application scenario of fig. 3, the terminal device 301 first receives an original image including a first face image and a moving picture composed of 5 still images, which are transmitted by a user. Thereafter, for each still image in the moving picture, the terminal device 301, in response to determining that a face image is included in the still image, takes the face image included in the still image as a second face image, processes the first face image based on the second face image, and replaces the second face image in the still image with the processed first face image. And finally, responding to the fact that each static image containing the face image in the dynamic image is replaced, and sending the replaced dynamic image to a display for display.

The method provided by the embodiment of the application is based on the first face image contained in the original image to replace the second face image contained in the information to be processed, and because the first face image is processed according to the second face image in each static image contained in the information to be processed during replacement, the information to be processed after replacement can be vivid and natural.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for transmitting information is shown. The flow 400 of the method for transmitting information comprises the steps of:

step 401, receiving an original image including a first face image and information to be processed composed of at least two still images.

In this embodiment, step 401 is similar to step 201 of the embodiment shown in fig. 2, and will not be described here again.

Step 402, a predetermined operation is performed on a still image of the at least two still images.

In the present embodiment, the above-described execution subject may execute a predetermined operation for each of at least two still images constituting the information to be processed. Wherein the predetermined operation includes the steps of:

in step 4021, in response to determining that the still image includes a face image, the face image included in the still image is taken as the second face image.

In this embodiment, step 4021 is similar to step 2021 of the embodiment shown in fig. 2, and is not described herein.

In step 4022, face key point detection is performed on the first face image and the second face image, so as to obtain key point information of the first face image and the second face image.

In this embodiment, the execution body may perform face key point detection on the first face image and the second face image, so as to obtain key point information of the face key points of the first face image and the second face image, for example, location information of each face key point. In practice, the face keypoints may be divided into interior keypoints and contour keypoints, and the interior keypoints may include those of the eyebrows, eyes, nose, mouth, etc. By detecting the key points of the human face, the position information of the eyebrows, eyes, nose, mouth and the like of the human face in the human face image can be positioned. It should be noted that, performing the face key point detection on the face image is a well-known technique widely studied and applied at present, and will not be described herein.

In step 4023, the key point information of the first face image is adjusted according to the key point information of the second face image.

In this embodiment, the executing body may adjust the key point information of the first face image according to the key point information of the second face image. As an example, the execution subject may adjust the position information of the key points of the first face image according to the position information of the respective key points in the second face image. Taking a mouth as an example, the executing body may determine an opening and closing angle of the mouth of the face in the second face image according to position information of a plurality of key points related to the mouth in the second face image, and take the determined opening and closing angle as the target opening and closing angle. According to the target opening and closing angle, the executing body can adjust the position information of a plurality of key points related to the mouth in the first face image, so that the opening and closing angle of the mouth of the face in the adjusted first face image is the same as or similar to the target opening and closing angle.

Step 4024, replacing the second face image in the still image with the processed first face image.

In this embodiment, step 4024 is similar to step 2023 in the embodiment shown in fig. 2, and is not described herein.

And step 403, in response to determining that replacement of each static image containing the face image in the information to be processed is completed, sending the replaced information to be processed.

In this embodiment, step 403 is similar to step 203 in the embodiment shown in fig. 2, and will not be described here again.

As can be seen from fig. 4, compared to the corresponding embodiment of fig. 2, the procedure 400 of the method for transmitting information in this embodiment highlights the step of adjusting the keypoint information of the first face image in accordance with the keypoint information of the second face image. Therefore, the facial actions of the human face in the first human face image are similar to those of the human face in the second human face image, so that the facial expression in each piece of static image of the replaced information to be processed is more natural and vivid.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for transmitting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for transmitting information of the present embodiment includes: a receiving unit 501, an executing unit 502, and a transmitting unit 503. Wherein the receiving unit 501 is configured to receive an original image containing a first face image and information to be processed composed of at least two still images; the execution unit 502 is configured to perform a predetermined operation on a still image of the at least two still images, wherein the execution unit 502 includes: a determining unit 5021 configured to, in response to determining that a face image is included in the still image, take the face image included in the still image as a second face image; a processing unit 5022 configured to process the first face image based on the second face image; a replacing unit 5023 configured to replace a second face image in the still image with the processed first face image; the transmitting unit 503 is configured to transmit the replaced information to be processed in response to determining that replacement of each still image including the face image in the above information to be processed is completed.

In this embodiment, the specific processes of the receiving unit 501, the executing unit 502 and the transmitting unit 503 of the apparatus 500 for transmitting information and the technical effects thereof may refer to the descriptions related to the steps 201, 202 and 203 in the corresponding embodiment of fig. 2, and are not repeated here.

In some optional implementations of this embodiment, the processing unit 5022 is further configured to: respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image; and adjusting the key point information of the first face image according to the key point information of the second face image.

In some optional implementations of this embodiment, the processing unit 5022 is further configured to: respectively inputting the first face image and the second face image into a pre-established expression recognition model to obtain expression categories of the first face image and the second face image, wherein the expression recognition model is used for representing the corresponding relation between the face image and the expression categories; and responding to the fact that the expression categories of the first face image and the second face image are matched, and taking the first face image as a processed first face image.

In some optional implementations of this embodiment, the processing unit 5022 is further configured to: and in response to determining that the expression categories of the first face image and the second face image are not matched, inputting the expression categories of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as a processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression categories and the generated face image.

In some optional implementations of this embodiment, the expression recognition model is trained by: acquiring a first training sample set, wherein the first training sample comprises a face image and expression categories corresponding to the face image; and taking the face image of the first training sample in the first training sample set as input, taking the expression category corresponding to the input face image as expected output, and training to obtain the expression recognition model.

In some optional implementations of this embodiment, the image generation model is trained by: acquiring a second training sample set, wherein the second training sample comprises a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category, wherein the sample face image and the sample generation face image are face images of the same person, and the expression category of the sample generation face image is matched with the sample expression category; and taking a sample face image and a sample expression category of a second training sample in the second training sample set as input, taking a sample generation face image corresponding to the input sample face image and sample expression category as expected output, and training to obtain the image generation model.

Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an embodiment of the present application. The apparatus shown in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601.

The computer readable medium according to the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a receiving unit, an executing unit, and a transmitting unit. The names of these units do not constitute limitations on the unit itself in some cases, and for example, the receiving unit may also be described as "a unit that receives an original image containing a first face image and information to be processed composed of at least two still images".

As another aspect, the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: receiving an original image containing a first face image and information to be processed consisting of at least two static images; for a still image of the at least two still images, performing the following operations: in response to determining that the still image contains a face image, taking the face image contained in the still image as a second face image; processing the first face image based on the second face image; replacing a second face image in the static image with the processed first face image; and in response to determining that the replacement of each static image containing the face image in the information to be processed is completed, sending the replaced information to be processed.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for transmitting information, comprising:

receiving an original image containing a first face image and information to be processed consisting of at least two static images;

for a still image of the at least two still images, performing the following operations: in response to determining that the still image contains a face image, taking the face image contained in the still image as a second face image; processing the first face image based on the second face image; replacing a second face image in the static image with the processed first face image;

in response to determining that replacement of each static image containing a face image in the information to be processed is completed, sending the replaced information to be processed; and

In response to determining that the expression categories of the first face image and the second face image do not match, processing the first face image based on the second face image by an image generation model, comprising:

extracting feature information of the first face through a feature extraction unit of an image generation model, and taking the feature information of the first face and the expression category of the second face as target input information; comparing the target input information with a plurality of preset input information of a second corresponding relation table; responding to the comparison result to be the same or similar, and taking the corresponding generated image in the second corresponding relation table as the processed first face image; the second corresponding relation comprises preset input information and a generated image corresponding to the preset input information, and the preset input information comprises preset face characteristics and expression categories.

2. The method of claim 1, wherein the processing the first face image based on the second face image comprises:

performing face key point detection on the first face image and the second face image respectively to obtain key point information of the first face image and the second face image;

And adjusting the key point information of the first face image according to the key point information of the second face image.

3. The method of claim 1, wherein the processing the first face image based on the second face image comprises:

respectively inputting the first face image and the second face image into a pre-established expression recognition model to obtain expression categories of the first face image and the second face image, wherein the expression recognition model is used for representing the corresponding relation between the face image and the expression categories;

and in response to determining that the expression categories of the first face image and the second face image match, taking the first face image as a processed first face image.

4. The method of claim 3, wherein the expression recognition model is trained by:

acquiring a first training sample set, wherein the first training sample comprises a face image and expression categories corresponding to the face image;

and taking a face image of a first training sample in the first training sample set as input, taking an expression category corresponding to the input face image as expected output, and training to obtain the expression recognition model.

5. The method of claim 1, wherein the image generation model is trained by:

acquiring a second training sample set, wherein the second training sample comprises a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category, wherein the sample face image and the sample generation face image are face images of the same person, and the expression category of the sample generation face image is matched with the sample expression category;

and taking a sample face image and a sample expression category of a second training sample in the second training sample set as input, taking a sample generation face image corresponding to the input sample face image and sample expression category as expected output, and training to obtain the image generation model.

6. An apparatus for transmitting information, comprising:

a receiving unit configured to receive an original image including a first face image and information to be processed composed of at least two still images;

an execution unit configured to perform a predetermined operation on a still image of the at least two still images, wherein the execution unit includes: a determination unit configured to, in response to determining that the face image is included in the still image, take the face image included in the still image as a second face image; a processing unit configured to process the first face image based on the second face image; a replacing unit configured to replace a second face image in the still image with the processed first face image;

A transmitting unit configured to transmit the replaced information to be processed in response to determining that replacement of each still image including the face image in the information to be processed is completed;

the processing unit is further configured to: in response to determining that the expression categories of the first face image and the second face image do not match, processing the first face image based on the second face image by an image generation model, comprising:

7. The apparatus of claim 6, wherein the processing unit is further configured to:

8. The apparatus of claim 6, wherein the processing unit is further configured to:

9. The apparatus of claim 8, wherein the expression recognition model is trained by:

10. The apparatus of claim 6, wherein the image generation model is trained by:

11. An apparatus, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

12. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-5.