CN114904268A

CN114904268A - Virtual image adjusting method and device, electronic equipment and storage medium

Info

Publication number: CN114904268A
Application number: CN202210619173.6A
Authority: CN
Inventors: 徐帅; 刘勇成; 胡志鹏; 袁思思; 程龙
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-08-16

Abstract

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for adjusting an avatar, an electronic device, and a storage medium. According to the method and the device, a first voice instruction containing a negative intention and/or an image adjusting intention of the current virtual image is monitored, target parameter adjusting data corresponding to the first voice instruction are determined, then the current virtual image is adjusted based on the target parameter adjusting data, and the adjusted target virtual image is displayed on the graphical user interface. Therefore, the adjustment of the virtual image is completed through voice interaction, and the time cost spent by the player in the virtual image adjustment process can be reduced.

Description

Virtual image adjusting method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for adjusting an avatar, an electronic device, and a storage medium.

Background

In the online game, a player can realize the virtual image of a user-defined virtual character by adjusting the character model parameters, and the adjustment is called 'pinching' face. The 'pinching' of the face is the self-service and personalized data operation on the appearance of the virtual character, such as the adjustment of the size of eyes, the height of eyebrows and the like.

Some face-pinching systems provided in games require a player to manually adjust parameters corresponding to parts in a character model on a graphical user interface, and since it is difficult for the player to control the specific meaning of each parameter, a situation in which the player repeatedly adjusts the parameters occurs in order to achieve a desired effect. Therefore, a large amount of time cost is spent on the player in the avatar adjustment process.

Disclosure of Invention

In view of this, the embodiments of the present application at least provide an avatar adjustment method, an avatar adjustment apparatus, an electronic device and a storage medium, which can reduce the time cost spent by a player in an avatar adjustment process.

The application mainly comprises the following aspects:

in a first aspect, an embodiment of the present application provides an avatar adjustment method, where a current avatar is displayed through a graphical user interface of a terminal device, and the adjustment method includes:

monitoring a first voice instruction; the first voice instruction includes a negative intention and/or an intention of image adjustment to the current avatar;

determining target parameter adjusting data corresponding to the first voice instruction; the target parameter adjusting data is used for changing the current virtual image;

and adjusting the current virtual image based on the target parameter adjusting data, and displaying the adjusted target virtual image on the graphical user interface.

In a second aspect, an embodiment of the present application further provides an avatar adjustment apparatus, where a current avatar is displayed through a graphical user interface of a terminal device, the adjustment apparatus includes:

the monitoring module is used for monitoring a first voice instruction; the first voice instruction includes a negative intention and/or an intention of image adjustment to the current avatar;

the determining module is used for determining target parameter adjusting data corresponding to the first voice instruction; the target parameter adjusting data is used for changing the current virtual image;

and the adjusting module is used for adjusting the current virtual image based on the target parameter adjusting data and displaying the adjusted target virtual image on the graphical user interface.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the memory communicate with each other through the bus, and the machine-readable instructions are executed by the processor to perform the steps of the avatar adjustment method according to the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for adjusting an avatar according to the first aspect is performed.

In the embodiment of the application, the target parameter adjusting data corresponding to the first voice instruction is determined by acquiring that the current user contains the negative intention of the current virtual image and/or the first voice instruction of the image adjusting intention, then the current virtual image is adjusted based on the target parameter adjusting data, the adjusted target virtual image is displayed on the graphical user interface, and compared with the situation that a player needs to manually and repeatedly adjust parameters corresponding to the middle position of a character model in the related technology, so that a large amount of time cost of the player is spent in the virtual image adjusting process, the virtual image is adjusted through voice interaction, and the time cost spent by the player in the virtual image adjusting process can be reduced.

Furthermore, the first voice instruction is used as the sample voice information, the target parameter adjusting data corresponding to the first voice instruction is used as the parameter adjusting data label, the network parameters in the machine learning model are updated, the accuracy of the machine learning model for outputting the target parameter adjusting data corresponding to the first voice instruction can be improved, and then the deviation between the finally presented target virtual image and the expected image effect of the player can be greatly reduced.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating an avatar adjustment method according to an embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a graphical user interface provided by an embodiment of the present application;

fig. 3 is a flowchart illustrating another avatar adjustment method according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating a method for adjusting an avatar according to an embodiment of the present application;

FIG. 5 is a functional block diagram of an avatar adjustment apparatus according to an embodiment of the present application;

fig. 6 is a second functional block diagram of an avatar adjustment apparatus according to an embodiment of the present application;

fig. 7 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be performed in reverse order or concurrently. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

To enable those skilled in the art to utilize the present disclosure, the following embodiments are presented in conjunction with a specific application scenario "pinching face", and it will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and application scenarios without departing from the spirit and scope of the present disclosure.

The method, the apparatus, the electronic device, or the computer-readable storage medium described in the embodiments of the present application may be applied to any scene that needs to be subjected to face-pinching, and the embodiments of the present application do not limit specific application scenes, and any scheme that uses the method, the apparatus, the electronic device, and the storage medium for adjusting the avatar provided in the embodiments of the present application is within the scope of the present application.

It should be noted that, before the present application is proposed, some face-pinching systems provided in games require a player to manually adjust parameters corresponding to parts in a character model on a graphical user interface, and since it is difficult for the player to control the specific meaning of each parameter, the situation that the player repeatedly adjusts the parameters occurs in order to achieve a desired effect, a great amount of time cost is spent on the player in the process of adjusting the avatar.

In view of the above problems, in the embodiment of the present application, a first voice instruction including a negative intention to a current avatar and/or an intention to adjust the avatar is monitored, target parameter adjusting data corresponding to the first voice instruction is determined, the current avatar is adjusted based on the target parameter adjusting data, and the adjusted target avatar is displayed on the graphical user interface. Therefore, the adjustment of the virtual image is completed through voice interaction, and the time cost spent by the player in the virtual image adjustment process can be reduced.

In the present application, "user" and "player" may be interchanged, and both representations have the same meaning and are both service users using terminal devices.

For the convenience of understanding of the present application, the technical solutions provided in the present application will be described in detail below with reference to specific embodiments.

The method for adjusting the avatar in one embodiment of the present application may be executed on a local terminal device or a server. When the method for adjusting the avatar is run on the server, the method can be implemented and executed based on a cloud interaction system, wherein the cloud interaction system comprises the server and the client device.

In an optional embodiment, various cloud applications may be run under the cloud interaction system, for example: and (5) cloud games. Taking a cloud game as an example, the cloud game refers to a game mode based on cloud computing. In the cloud game operation mode, the game program operation main body and the game picture presentation main body are separated, the storage and operation of the game loading method are completed on the cloud game server, and the client device is used for receiving and sending data and presenting the game picture, for example, the client device can be a display device with a data transmission function close to a user side, such as a mobile terminal, a television, a computer, a palm computer and the like; but the cloud game server which performs information processing is a cloud. When a game is played, a player operates the client device to send an operation instruction to the cloud game server, the cloud game server runs the game according to the operation instruction, data such as game pictures and the like are encoded and compressed, the data are returned to the client device through a network, and finally the data are decoded through the client device and the game pictures are output.

In an optional implementation manner, taking a game as an example, the local terminal device stores a game program and is used for presenting a game screen. The local terminal device is used for interacting with the player through a graphical user interface, namely, a game program is downloaded and installed and operated through an electronic device conventionally. The manner in which the local terminal device provides the graphical user interface to the player may include a variety of ways, for example, it may be rendered for display on a display screen of the terminal or provided to the player through holographic projection. For example, the local terminal device may include a display screen for presenting a graphical user interface including a game screen and a processor for running the game, generating the graphical user interface, and controlling display of the graphical user interface on the display screen.

In a possible implementation manner, an embodiment of the present invention provides an avatar adjustment method, which provides a graphical user interface through a terminal device, where the terminal device may be the aforementioned local terminal device, and may also be the aforementioned client device in the cloud interaction system.

Fig. 1 is a flowchart of an avatar adjustment method according to an embodiment of the present disclosure. As shown in fig. 1, the method for adjusting an avatar provided in the embodiment of the present application displays a current avatar through a graphical user interface of a terminal device, and includes the following steps:

s101: the first voice command is monitored.

In specific implementation, the method and the device can achieve adjustment of the virtual image through voice interaction, specifically, when a user is dissatisfied with the current virtual image and wants to adjust, a voice input instruction can be sent, and the adjustment of the current virtual image is achieved through a first voice instruction carried in the voice input instruction.

Here, the avatar (Virtual Character) may refer to a synthesized avatar. In terms of the structure of the avatar, the avatar may be the image of a three-dimensional model, or may be the image of a planar image. The virtual image can be an image formed by simulating a human figure, an image formed by simulating an animal figure, or an image formed based on images in cartoons and cartoons. The avatar is generally an avatar, which may be a person, animal, etc.

The current avatar may be an initialized template avatar, or an avatar corresponding to the current time obtained by the user editing the template avatar.

It should be noted that the first voice instruction includes evaluation information of negative intention of the current avatar and/or character adjustment intention, and the evaluation information may be understood as evaluation of the effect of the avatar, for example, negative intention such as "face too large", "mouth too small", etc., character adjustment intention such as "adjust face to melon type", "adjust expression to happy expression", etc.

Here, a user may send a voice input instruction by operating a terminal device, specifically, a graphical user interface is provided by the terminal device, the graphical user interface displays a voice control and a current virtual image, the user realizes sending the voice input instruction by triggering a voice input function of the voice control, the voice input instruction carries a first voice instruction, of course, if the user wants to manually adjust a numerical value corresponding to a parameter of each part in the character model, the user may also select to close the input function of the voice control, and use a manual adjustment function, that is, the scheme provided by the present application may select to manually adjust a numerical value corresponding to a parameter of each part in the character model, and may also use to send voice information to adjust a parameter of any one part.

It should be noted that, if the execution main body of the virtual image adjusting method provided by the present application is a terminal device, the terminal device may directly obtain a first voice instruction sent by a user; if the execution main body of the virtual image adjusting method provided by the application is the server, the server can receive the first voice command sent by the terminal equipment.

Here, taking the execution subject as the terminal device as an example, the first voice instruction for the user to perform the image adjustment on the current avatar is obtained according to the following steps: and responding to the voice input operation aiming at the voice control in the graphical user interface, and monitoring the first voice instruction.

It should be noted that, in the face-pinching system provided in the related art, a player needs to manually adjust parameters corresponding to positions in a character model on a graphical user interface, and since it is difficult for the player to control specific meanings of each parameter, the situation that the player repeatedly adjusts the parameters one by one in order to achieve a desired effect occurs, and the player can only adjust one parameter at a time, a great amount of time cost is required for the player in the process of generating a final target avatar. In view of the above situation, the inventor of the present invention has found through research that although a player does not have an accurate conceptual understanding of parameters of a part in a character model, and cannot grasp an avatar effect of an avatar corresponding to a specific parameter value, the player has a judgment on a face-pinching effect, i.e., the avatar effect, such as "face is too fat", "mouth is too big", and "smile is too exaggerated", for which the present application provides a method for adjusting an avatar by using a first voice command including an avatar effect evaluation, i.e., with a negative intention and/or an avatar adjustment intention for a current avatar, and the adjustment of the avatar may be accomplished through a small amount of voice interaction without manually repeatedly adjusting parameters, without knowing specific meanings of each parameter, and may be accomplished by simultaneously adjusting a plurality of parameters at one time, in this way, the time cost spent by the player in the avatar adjustment process can be greatly reduced.

Here, the evaluation in the aspect of the figure effect may be an effect evaluation on a face shape, an expression, or the like, and the face shape may be the whole face and may be a face part. Wherein the first voice instruction includes at least one of the following evaluation information: a face evaluation instruction and an expression evaluation instruction. For example, face evaluation instructions such as "face too fat", "face too thin", "face too sharp", "mouth too small", "eyes too large", "nose too low", etc.; expression evaluation instructions such as "smile is too exaggerated", "expression is too serious", "no smile", and the like.

S102: and determining target parameter adjusting data corresponding to the first voice instruction.

In a specific implementation, after a first voice instruction sent by a user is acquired, target parameter adjusting data matched with the first voice instruction can be determined based on evaluation information of image effect evaluation including negative intention and/or image adjustment intention of a current virtual image in the first voice instruction, wherein the target parameter adjusting data is used for changing the current virtual image. It should be noted that there is an association relationship between the first voice command and the target parameter adjustment data, where the target parameter adjustment data may be data corresponding to one or more parameters corresponding to a certain part, and the target parameter adjustment data may also be data corresponding to a plurality of parameters of a plurality of parts. Specifically, if the first voice instruction contains evaluation information (including negative intention and/or image adjustment intention) for a certain part of the character model, the target parameter adjusting data is a target numerical value or an adjustment amplitude value of at least one parameter corresponding to the part in the character model, for example, the first voice instruction is "face too large", and the corresponding target parameter adjusting data includes "concavity of face +2, inclination of face-3, forehead height + 6"; if the first voice instruction contains evaluation information aiming at the expression of the character model, the target parameter adjusting data is a target numerical value or an adjusting amplitude value of at least one parameter corresponding to the expression in the character model, and the current virtual image is adjusted by using the target parameter adjusting data.

Wherein, the adjusting parameters of the forehead position, such as upper and lower forehead, front and back forehead, forehead angle, forehead width, forehead length, forehead plump, etc.; adjusting parameters of the seal hall part, such as the upper and lower seal halls, the front and rear seal halls, the angle of the seal hall, the width of the seal hall, the length of the seal hall, the fullness of the seal hall and the like; adjustment parameters of the cheekbone parts, such as the left and right cheekbones, the upper and lower cheekbones, the angle of the cheekbones, the fullness of the cheekbones and the like; adjusting parameters of the lower jaw part, such as the left and right jaw angles, the upper and lower jaw angles, the front and back jaw angles, the angle angles, the inclination of the angle angles, the outward turning of the angle angles, the thickness of the angle angles, the full angle angles, the left and right jaws, the upper and lower jaws, the front and back jaws, the angle angles of the lower jaws, the inclination of the lower jaws, the outward turning of the lower jaws, the thickness of the lower jaws, the full lower jaws and the like; adjusting parameters of the chin part, such as left and right sides of the chin, upper and lower sides of the chin, front and back sides of the chin, angles of two sides of the chin, inclination of two sides of the chin, eversion of two sides of the chin, thickness of two sides of the chin, fullness of two sides of the chin and the like; adjusting parameters of the eyebrow part, such as eyebrow left and right, eyebrow up and down, eyebrow front and back, eyebrow angle, eyebrow height, eyebrow length, eyebrow width, and eyebrow plump; adjusting parameters of the eye part, such as iris of the eye, pupil of the eye, left and right of the eye, up and down of the eye, front and back of the eye, angle of the eye, height of the eye, eye distance and the like; adjusting parameters of the nose part, such as parameters of a nose bridge, parameters of a nose head, parameters of a nose wing and the like; adjusting parameters of the lip part, such as upper lip parameters, lower lip parameters and the like.

The target parameter adjustment data may be data for adjusting the current avatar, and the target parameter adjustment data may be data or a combination of data corresponding to one or more parameters of at least one portion. The relationship between the first voice command and the target parameter adjusting data may be preset in advance, for example, a mapping relationship table is prepared in advance, and after the first voice command is obtained, the target parameter adjusting data corresponding to the first voice command is directly queried from the mapping relationship table; or a machine learning model can be trained in advance, and the target parameter adjusting data corresponding to the first voice command can be obtained by inputting the first voice command into the machine learning model.

It should be noted that, in the face-pinching system provided in the related art, since it is difficult for the player to control the specific meaning of each parameter, it is more difficult to select an appropriate combination of parameters to be adjusted from among the parameters in the character model, let alone determine the adjustment values of the parameters, and therefore it is difficult for the player to pinch the desired avatar, that is, it is difficult for the player to have a large deviation between the finally presented target avatar and the avatar effect expected by the player. In view of the above situation, the present application can directly determine the parameter combination to be adjusted that matches the image effect evaluation by providing the voice information with the image effect evaluation (negative intention and/or image adjustment intention to the current avatar), and can also determine the value of the parameter to be adjusted, so that the deviation between the target avatar that is finally presented and the image effect that the player expects can be greatly reduced.

Here, a plurality of sets of candidate parameter adjusting data may be provided for the user to select the target parameter adjusting data, so as to improve the degree of freedom of the user selection, specifically, step S102 includes:

step 1021: and determining at least one group of first candidate parameter adjusting data corresponding to the first voice instruction, and displaying each group of first candidate parameter adjusting data and/or an avatar effect picture corresponding to each group of first candidate parameter adjusting data on the graphical user interface.

In specific implementation, at least one group of first candidate parameter adjusting data matched with the first voice instruction is determined according to the first voice instruction, and each group of first candidate parameter adjusting data can be displayed on a graphical user interface provided by the terminal device, or an avatar effect diagram corresponding to each group of first candidate parameter adjusting data is displayed on the graphical user interface, or each group of first candidate parameter adjusting data and an avatar effect diagram corresponding to each group of first candidate parameter adjusting data can be simultaneously displayed on the graphical user interface. The first candidate parameter adjusting data and the avatar effect picture can be displayed on the graphical user interface in a popup mode. The parameters included in different sets of first candidate parameter adjusting data may be the same or different, and the values corresponding to the same parameters in different sets of first candidate parameter adjusting data may be the same or different, as long as it is ensured that the different sets of first candidate parameter adjusting data have differences.

Fig. 2 is a schematic diagram of a graphical user interface provided in an embodiment of the present application, where a voice control, three groups of first candidate parameter adjusting data, and an avatar effect diagram corresponding to each group of the first candidate parameter adjusting data are displayed in the graphical user interface. In fig. 2, the parameters included in each group of first candidate parameter adjusting data are different, and the values corresponding to the parameters are also different, and the avatar effect maps are also different. Here, after the user selects one of the avatar effect diagrams, the avatar effect diagram is displayed on the graphical user interface as a target avatar, and at this time, the other avatar effect diagrams not selected are not displayed.

Step 1022: and responding to the selection operation aiming at any group of the first candidate parameter adjusting data or any avatar effect graph on the graphical user interface, and determining the target parameter adjusting data corresponding to the first voice instruction.

In a specific implementation, a user may select any one group of first candidate parameter adjusting data on a graphical user interface, or select any one avatar effect diagram on the graphical user interface, to determine target parameter adjusting data for adjusting a current avatar.

Here, the user may further adjust any one or more groups of first candidate parameter adjusting data, an avatar effect diagram corresponding to the adjusted candidate parameter adjusting data may be displayed on the graphical interface, and after viewing the adjusted avatar effect diagrams, the user selects a group of adjusted first candidate parameter adjusting data as the target parameter adjusting data.

Step 1023: and if all the first candidate parameter adjusting data in the graphical user interface or all the avatar effect graphs are not selected, adding all the first candidate parameter adjusting data to a user personalized rejection list.

In a specific implementation, if the user does not select any group of first candidate parameter adjusting data on the graphical user interface, nor selects any avatar effect diagram, which indicates that the user is dissatisfied with all the first candidate parameter adjusting data currently provided, at this time, other second parameter adjusting data can be provided to the user as target parameter adjusting data, and the first candidate parameter adjusting data of each group, which are not selected at this time, are added to the user personalized rejection list.

Step 1024: monitoring a second voice instruction, and determining second candidate parameter adjusting data corresponding to the second voice instruction through a machine learning model; the matching probability of the second candidate parameter adjusting data is greater than or equal to a preset matching probability threshold.

And the second candidate parameter adjusting data is selected by the machine learning model after the first candidate parameter adjusting data in the user personalized rejection list is excluded. It should be noted that, in the present application, the user may obtain the target parameter-adjusting data for adjusting the current avatar corresponding to the evaluation of the avatar effect by inputting the first voice instruction with the negative intention and/or the intention of adjusting the avatar, so that the user may adjust the current avatar by inputting the first voice instruction for the current avatar without understanding the specific meaning of each parameter corresponding to the part in the character model or manually adjusting the parameter. That is, the application converts the complex face-pinching parameter-tuning process into simple evaluation of the image effect of the current avatar, so that the adjustment can be performed in full compliance with own will and expectation, so that the finally presented target avatar is as close as possible to the image effect expected by the user. By adopting the scheme of the application, a plurality of parameters can be adjusted simultaneously, and the parameter adjusting process can be greatly accelerated, so that the time cost spent by a user can be reduced in the virtual image adjusting process, and the deviation between the finally formed target virtual image and the image effect expected by the user can be greatly reduced.

S103: and adjusting the current virtual image based on the target parameter adjusting data, and displaying the adjusted target virtual image on the graphical user interface.

In specific implementation, after the target parameter adjusting data for adjusting the current avatar is determined, that is, after parameter adjusting data corresponding to the image effect evaluation given by the user for the current avatar is obtained, the target parameter adjusting data can be directly used for adjusting the current avatar, and the adjusted target avatar is displayed on the graphical user interface. Specifically, if the target parameter adjustment data is a target value of at least one parameter corresponding to the part in the character model, the current value of the corresponding parameter can be adjusted to the target value, so as to adjust the current virtual character; if the target parameter adjusting data is the adjusting amplitude value of at least one parameter corresponding to the part in the character model, the current value of the corresponding parameter and the adjusting amplitude value can be added to obtain a target value, and the current value of the corresponding parameter is adjusted to the target value, so that the adjustment of the current virtual character is realized.

It should be noted that, in the final process of generating the target avatar, there may be at least one avatar adjustment, and the user may adjust the avatar by using the parameter-adjusting data determined by sending the voice command each time. If the virtual image is adjusted for a plurality of times, in the process of the adjustment for the plurality of times, if the voice commands sent by the user every time are the same, the user feels that the adjustment range of the virtual image is not enough, the same or different range adjustment values of the corresponding part parameters can be provided every time, or different target values corresponding to the part parameters can be provided, so that the finally obtained target virtual image can reach the expected image of the user. In addition, as long as the user does not finish the process of adjusting the virtual character, the voice command can be sent out until the user is satisfied with the adjusted virtual character.

In the embodiment of the application, a first voice instruction for evaluating the image effect of the current virtual image by a user is obtained, target parameter adjusting data corresponding to the first voice instruction is determined, then the current virtual image is adjusted based on the target parameter adjusting data, and the adjusted target virtual image is displayed on the graphical user interface. Therefore, the adjustment of the virtual image is completed through voice interaction, and the time cost spent by the player in the virtual image adjustment process can be reduced.

Fig. 3 is a flowchart of another avatar adjustment method according to an embodiment of the present disclosure.

As shown in fig. 3, the method for adjusting an avatar provided in the embodiment of the present application includes the following steps:

s301: the first voice command is monitored.

S302: and inputting the first voice command into a trained machine learning model, and determining the target parameter adjusting data corresponding to the first voice command through the machine learning model.

In specific implementation, a machine learning model may be trained in advance, and the first voice command is input into the machine learning model, so as to obtain the target parameter tuning data matched with the first voice command.

Here, the machine learning model may be a deep learning model including an embedding layer, a convolutional layer, at least two attention layers, a full connection layer, and an output layer. The following explains the working principle of the machine learning model, that is, step S302 inputs the first voice command into the trained machine learning model, and determines the target parameter adjusting data corresponding to the first voice command through the machine learning model, including the following steps:

step 3021: and inputting the first voice instruction into the embedded layer to obtain voice characteristic information.

In a specific implementation, the first network layer of the machine learning model is an Embedding layer (Embedding), and the first voice command is input into the Embedding layer to perform voice feature extraction, so as to obtain voice feature information of the first voice command. Here, the embedding layer is essentially to extract feature data in the speech information.

Step 3022: and inputting the voice characteristic information into the convolutional layer to obtain semantic characteristic information.

In a specific implementation, the second network layer of the machine learning model is a convolutional layer, and semantic feature extraction is performed by using the speech feature information output by the embedded layer as an input of the convolutional layer to obtain semantic feature information of the first speech instruction, specifically, the convolutional layer extracts the semantic feature by capturing some information between adjacent sound volumes, such as whether a similar sound volume change pattern exists between "29 milliseconds to 31 milliseconds" and "121 milliseconds to 125 milliseconds".

Step 3023: and sequentially inputting the semantic feature information into the at least two attention layers to obtain at least two attention feature information.

In a specific implementation, the third network layer of the machine learning model is an attention layer, the machine learning model may have at least two attention layers, and the attention layers have different attention information, and here, the two attention layers are taken as an example to illustrate a first attention layer and a second attention layer, where the first attention layer focuses on time domain information and the second attention layer focuses on frequency domain information, and the speech feature information output by the convolutional layer is input to the first attention layer to obtain a first attention feature in terms of time domain, and the speech feature information output by the convolutional layer is input to the second attention layer to obtain a second attention feature in terms of frequency domain. In this way, at least two aspects of attention feature information may be obtained.

Step 3024: and inputting the at least two attention characteristic information into the full connection layer together to obtain fusion characteristic information.

In a specific implementation, the fourth network layer of the machine learning model is a full connection layer, and at least two attention feature information which are output by at least two attention layers together are used as input of the full connection layer for feature fusion to obtain fusion feature information of the first voice instruction. Here, the role of the full connectivity layer is to establish connectivity between different feature dimensions, that is, to let the machine learning model focus on a specific cross dimension.

Step 3025: and inputting the fusion characteristic information into the output layer to obtain the target parameter adjusting data corresponding to the first voice command.

In specific implementation, a fifth network layer of the machine learning model is an output layer, and the fused feature information which is output by all the connection layers together is used as the input of the output layer to perform output probability calculation to obtain the target parameter adjusting data corresponding to the first voice command.

Here, the following specifically describes an operation principle of an output layer, that is, the step 3025 inputs the fusion feature information into the output layer to obtain the target parameter adjustment data corresponding to the first voice command, and includes the following steps:

step a 1: and determining multiple groups of candidate parameter adjusting data and the matching probability corresponding to each group of candidate parameter adjusting data through the machine learning model.

In one embodiment, the output layer includes an activation function, and the activation function is used to calculate a matching probability corresponding to each candidate parameter data, and the matching probability corresponding to each candidate parameter data is used to characterize a matching degree between the candidate parameter data and the first voice command, for example, [ (candidate parameter data 1, 0.999), (candidate parameter data 2, 0.023), (candidate parameter data 3, 0.977), (candidate parameter data 4, 0.721) ].

Step a 2: and taking the candidate parameter adjusting data of which the matching probability is greater than or equal to a preset matching probability threshold value in the plurality of groups of candidate parameter adjusting data as the target parameter adjusting data.

In specific implementation, from the multiple sets of candidate parameter adjusting data, the candidate parameter adjusting data with the corresponding matching probability ranked in the front preset digit number may be selected as the target parameter adjusting data, or the candidate parameter adjusting data with the corresponding matching probability greater than the preset probability value may be selected as the target parameter adjusting data.

Here, the machine learning model may be deployed on a server or a local terminal device, and the machine learning model may be understood as a black box, and the target parameter adjusting data for adjusting the current avatar may be obtained by inputting the first voice instruction into the machine learning model.

It should be noted that the training process of the machine learning model is performed to better define the working principle of the machine learning model that can output the target adjustment data. The following describes the training process of the machine learning model mentioned in step S302, that is, the machine learning model is trained according to the following steps:

step b 1: and acquiring a plurality of sample voice messages and a parameter adjusting data label corresponding to each sample voice message.

In the implementation, a large number of sample voice messages are obtained, and the sample voice messages are evaluation messages containing negative intention and/or intention of image adjustment of the virtual image, and specifically, the sample voice messages can be voice evaluation messages (also understood as face pinching effect) of some virtual images performed by users in the game testing process.

And the parameter adjusting data label represents the real parameter adjusting data corresponding to the sample voice information.

In addition, for each sample voice message, a professional can be requested to configure a parameter-adjusting combination data for each sample voice message, and the parameter-adjusting combination data is used as a parameter-adjusting data label corresponding to the sample voice message; or a preset parameter-adjusting combination template may be set in advance, at least one set of parameter-adjusting combination data is primarily matched with each sample voice message through the preset parameter-adjusting combination template, and then a professional is required to adjust the parameter-adjusting combination data to obtain a parameter-adjusting data tag corresponding to each sample voice message, specifically, for each sample voice message, the parameter-adjusting data tag corresponding to the sample voice message is determined according to the following steps:

converting the sample voice information into sample text information; performing word segmentation on the sample text information to obtain a plurality of keywords; inquiring parameter-adjusting combined data which has a mapping relation with each keyword from a preset parameter-adjusting combined template library; and responding to a change instruction aiming at the parameter adjusting combined data to obtain a parameter adjusting data label corresponding to the sample voice information.

In the specific implementation, when a parameter-adjusting data tag is matched with each sample voice message, for each sample voice message, the sample voice message is converted into a sample text message, then word segmentation processing is performed on the sample text message to obtain a plurality of keywords, and parameter-adjusting combination data corresponding to the keywords is inquired in a preset parameter-adjusting combination template by using each keyword, wherein the preset parameter-adjusting combination template stores the plurality of keywords and the parameter-adjusting combination data associated with each keyword, the association between the keywords and the parameter-adjusting combination data can be determined from the voice message input by a user in a game test and the selected parameter-adjusting combination data, further, each parameter-adjusting data is finely adjusted to obtain the parameter-adjusting data tag corresponding to each sample voice message, so that on the premise of ensuring the accuracy, the efficiency of determining the parameter adjusting data label corresponding to each sample voice message can be greatly improved.

Step b 2: and training the initial learning model based on the plurality of sample voice information and the parameter adjusting data labels corresponding to the sample voice information to obtain the machine learning model.

In specific implementation, after a sufficient number of sample voice information and the parameter-adjusting data labels corresponding to each sample voice information are acquired, the initial learning model can be trained through the sample voice information and the parameter-adjusting data labels corresponding to each sample voice information, so that the machine learning model is obtained.

Here, the following further describes a process of training the initial learning model based on the plurality of sample voice information and the parameter-adjusted data labels corresponding to the sample voice information in step b2 to obtain the machine learning model, so as to further clarify a training principle of the machine learning model, specifically, the method includes the following steps:

step b 21: and inputting each sample voice information into the initial learning model to obtain the prediction parameter adjusting data corresponding to the sample voice information.

Step b 22: and determining cross entropy loss between the prediction parameter adjusting data and the parameter adjusting data label corresponding to each sample voice information, and generating the well-trained machine learning model when the cross entropy loss meets a model training cut-off condition.

In a specific implementation, the training cutoff condition may be that the cross entropy loss is less than or equal to a preset threshold, where the preset threshold may be set according to the precision of the actual requirement of the machine learning model.

S303: and adjusting the current virtual image based on the target parameter adjusting data, and displaying the adjusted target virtual image on the graphical user interface.

S304: and taking the first voice instruction as sample voice information, taking the target parameter adjusting data as a parameter adjusting data label, and updating the network parameters in the machine learning model based on the first voice instruction and the target parameter adjusting data.

It should be noted that, in the process of "pinching the face" by the user, that is, in the process of generating the final target avatar, multiple times of voice information input may be performed, and multiple times of parameter-tuning data are used as the tuning data for tuning the intermediate avatar, where the target parameter-tuning data selected by the user from multiple candidate parameter-tuning data each time may be used as the parameter-tuning data tag, and the first voice instruction corresponding to the target parameter-tuning data may be used as the sample voice information to continue training the machine learning model, that is, to update the network parameters in the machine learning model, so that the accuracy of the machine learning model in outputting the parameter-tuning data may be greatly improved.

In addition, in step S303, the current avatar is adjusted based on the target parameter adjustment data, and after the adjusted target avatar is displayed on the graphical user interface, personalized records about the user may also be performed, that is, parameter adjustment data is pushed to the current user according to the preference of the user for selecting the target parameter adjustment data each time, specifically, the method includes the following steps:

step c 1: and counting the times of respectively selecting the target parameter adjusting data.

Step c 2: and according to the times of respectively selecting each target parameter adjusting data, distributing weight to each target parameter adjusting data.

Step c 3: and pushing parameter adjusting data to the current user based on the weight corresponding to each target parameter adjusting data and the machine learning model.

In specific implementation, the content of the speech information may also be referred to push the parameter adjustment data to the user, for example, the user prefers to input the speech information 1, and selects the parameter adjustment data 2 according to the preference, so that the weight of the parameter adjustment data 2 may be increased, and when the machine learning model outputs a plurality of candidate parameter adjustment data and the matching probability corresponding to each candidate parameter adjustment data, the weight of the parameter adjustment data is increased to push the final target parameter adjustment data to the user.

The descriptions of S301 and S303 may refer to the descriptions of S101 and S103, and the same technical effect can be achieved, which is not described in detail herein.

Fig. 4 is a flowchart illustrating an avatar adjustment method according to an embodiment of the present application, including the following steps:

s401: and constructing a training set.

Specifically, a test player gives a large amount of voice information for evaluating the face-pinching effect, and the voice information is taken as sample voice information; matching each sample voice message by a professional to obtain a group of parameter-adjusting combined data as a parameter-adjusting data label; and taking the plurality of sample voice information and the tuning parameter data labels corresponding to each sample voice information as a training set.

S402: an initial learning model is trained.

Specifically, an initial learning model is trained by using a plurality of sample voice messages and parameter-adjusting data labels corresponding to the sample voice messages, and a machine learning model is obtained through training.

S403: deployment of the machine learning model.

Specifically, the machine learning model is deployed on the server side.

S404: the player pinches the face.

S405: the client device collects voice information.

S406: and determining candidate parameter adjusting data.

Specifically, candidate parameter adjusting data with n-bit top rank is given according to the first voice instruction by using a machine learning model deployed at a server side, wherein n is greater than or equal to 1.

S407: and obtaining a plurality of candidate avatar effect graphs.

Specifically, an avatar effect diagram corresponding to each candidate parameter adjusting data is displayed on the client device.

S408: and generating a target virtual image effect picture.

Specifically, the target player selects 1 of the n avatar effect maps or cancels the selection, and the avatar effect map before adjustment is still displayed if the selection is cancelled.

S409: the face pinching process is ended.

Based on the same application concept, the embodiment of the present application further provides an avatar adjustment apparatus corresponding to the avatar adjustment method provided in the above embodiment, and since the principle of the apparatus in the embodiment of the present application for solving the problem is similar to the avatar adjustment method in the above embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 5 and 6, fig. 5 is a first functional block diagram of an avatar adjustment apparatus 500 according to an embodiment of the present disclosure, and fig. 6 is a second functional block diagram of the avatar adjustment apparatus 500 according to the embodiment of the present disclosure. As shown in fig. 5, the avatar adjustment apparatus 500 includes:

a monitoring module 510, configured to monitor a first voice command; the first voice instruction includes a negative intention and/or an intention of image adjustment to the current avatar;

a determining module 520, configured to determine target parameter adjusting data corresponding to the first voice instruction; the target parameter adjusting data is used for changing the current virtual image;

an adjusting module 530, configured to adjust the current avatar based on the target parameter adjusting data, and display the adjusted target avatar on the graphical user interface.

In a possible implementation manner, as shown in fig. 5, the determining module 520 is configured to determine the target parameter tuning data corresponding to the first voice command according to the following steps:

and inputting the first voice command into a trained machine learning model, and determining the target parameter adjusting data corresponding to the first voice command through the machine learning model.

In one possible embodiment, as shown in FIG. 5, the machine learning model includes an embedding layer, a convolutional layer, at least two attention layers, a fully-connected layer, and an output layer; the determining module 520 is specifically configured to:

inputting the first voice instruction into the embedded layer to obtain voice characteristic information;

inputting the voice characteristic information into the convolutional layer to obtain semantic characteristic information;

the semantic feature information is sequentially input into the at least two attention layers to obtain at least two attention feature information;

inputting the at least two attention feature information into the full connection layer together to obtain fusion feature information;

and inputting the fusion characteristic information into the output layer to obtain the target parameter adjusting data corresponding to the first voice command.

In a possible implementation, as shown in fig. 6, the determining module 520 includes a determining unit 521; the determining unit 521 is configured to:

determining a plurality of groups of candidate parameter adjusting data and the matching probability corresponding to each group of candidate parameter adjusting data through the machine learning model;

and taking the candidate parameter adjusting data of which the matching probability is greater than or equal to a preset matching probability threshold value in the plurality of groups of candidate parameter adjusting data as the target parameter adjusting data.

In a possible embodiment, as shown in fig. 6, the avatar adjustment apparatus 500 further includes a training module 540; the training module 540 is configured to train the machine learning model according to the following steps:

acquiring a plurality of sample voice messages and a parameter adjusting data label corresponding to each sample voice message; the parameter adjusting data label represents real parameter adjusting data corresponding to the sample voice information;

inputting each sample voice information into an initial learning model to obtain prediction parameter adjusting data corresponding to the sample voice information;

and determining cross entropy loss between the prediction parameter adjusting data and the parameter adjusting data label corresponding to each sample voice information, and generating the well-trained machine learning model when the cross entropy loss meets a model training cut-off condition.

In a possible implementation manner, as shown in fig. 6, the training module 540 is further configured to, for each sample voice information, determine a parameter-adjusted data tag corresponding to the sample voice information according to the following steps:

converting the sample voice information into sample text information;

performing word segmentation on the sample text information to obtain a plurality of keywords;

inquiring parameter-adjusting combined data which has a mapping relation with each keyword from a preset parameter-adjusting combined template library;

and responding to a change instruction aiming at the parameter adjusting combined data to obtain a parameter adjusting data label corresponding to the sample voice information.

In a possible embodiment, as shown in fig. 6, the avatar adjustment apparatus 500 further includes an update module 550; the update module 550 is configured to:

and taking the first voice instruction as sample voice information, taking the target parameter adjusting data as a parameter adjusting data label, and updating the network parameters in the machine learning model based on the first voice instruction and the target parameter adjusting data.

In one possible implementation, as shown in fig. 6, the listening module 510 is configured to obtain the first voice command according to the following steps:

and responding to the voice input operation aiming at the voice control in the graphical user interface, and monitoring the first voice instruction.

In a possible implementation, as shown in fig. 6, the determining module 520 is further configured to determine the target parameter adjustment data corresponding to the first voice command according to the following steps:

determining at least one group of first candidate parameter adjusting data corresponding to the first voice instruction, and displaying each group of first candidate parameter adjusting data and/or an avatar effect graph corresponding to each group of first candidate parameter adjusting data on the graphical user interface;

and responding to the selection operation aiming at any group of the first candidate parameter adjusting data or any avatar effect graph on the graphical user interface, and determining the target parameter adjusting data corresponding to the first voice instruction.

In a possible embodiment, as shown in fig. 6, the avatar adjustment apparatus 500 further includes an adding module 560; the adding module 560 is configured to:

and if all the first candidate tuning parameter data in the graphical user interface are not selected or all the avatar effect graphs are not selected, adding all the first candidate tuning parameter data to a user personalized rejection list. In one possible implementation, as shown in fig. 6, the adding module 560 is further configured to:

monitoring a second voice instruction, and determining second candidate parameter adjusting data corresponding to the second voice instruction through a machine learning model; the matching probability of the second candidate parameter adjusting data is greater than or equal to a preset matching probability threshold;

and the second candidate parameter adjusting data is selected by the machine learning model after the first candidate parameter adjusting data in the user personalized rejection list is excluded.

In a possible embodiment, as shown in fig. 6, the avatar adjustment apparatus 500 further includes a push module 570; the pushing module 570 is configured to:

counting the times of respectively selecting the target parameter adjusting data;

distributing weight to each target parameter adjusting data according to the times of respectively selecting each target parameter adjusting data;

and pushing parameter adjusting data to the current user based on the weight corresponding to each target parameter adjusting data and the machine learning model.

In the embodiment of the present application, a first voice instruction of a negative intention and/or an intention of image adjustment of a current avatar of a user is obtained through the monitoring module 510, target parameter adjusting data corresponding to the first voice instruction is determined through the determining module 520, and then the current avatar is adjusted through the adjusting module 530 based on the target parameter adjusting data, and the adjusted target avatar is displayed on the graphical user interface. Therefore, the adjustment of the virtual image is completed through voice interaction, and the time cost spent by the player in the virtual image adjustment process can be reduced.

Based on the same application concept, referring to fig. 7, a schematic structural diagram of an electronic device 700 provided in the embodiment of the present application includes: a processor 710, a memory 720 and a bus 730, wherein the memory 720 stores machine-readable instructions executable by the processor 710, when the electronic device 700 is running, the processor 710 communicates with the memory 720 through the bus 730, and the machine-readable instructions are executed by the processor 710 to perform the steps of the avatar adjustment method according to any of the embodiments.

In particular, the machine readable instructions, when executed by the processor 710, may perform the following:

monitoring a first voice instruction by the first voice instruction; the first voice instruction includes a negative intention and/or an intention of image adjustment to the current avatar;

determining target parameter adjusting data corresponding to the first voice instruction; the first voice instruction includes a negative intention and/or an intention of image adjustment to the current avatar;

Further, the machine readable instructions when executed by the processor 710 may further perform the following:

acquiring a plurality of sample voice messages and a parameter adjusting data label corresponding to each sample voice message;

and training the deep learning model based on the plurality of sample voice information and the parameter adjusting data labels corresponding to the sample voice information to obtain the machine learning model.

converting the sample voice information into sample text information;

inputting each sample voice information into the deep learning model to obtain prediction parameter adjusting data corresponding to the sample voice information;

and taking the first voice instruction as sample voice information, taking the target parameter adjusting data as a parameter adjusting data label, and updating network parameters in the machine learning model based on the first voice instruction and the target parameter adjusting data.

and if all the first candidate parameter adjusting data in the graphical user interface are not selected or all the avatar effect graphs are not selected, adding all the first candidate parameter adjusting data to a user personalized rejection list.

In the embodiment of the application, a first voice instruction for evaluating the image effect of the current virtual image by a user is obtained, target parameter adjusting data corresponding to the first voice instruction is determined, and then the current virtual image is adjusted based on the target parameter adjusting data, and the adjusted target virtual image is displayed on the graphical user interface. Therefore, the adjustment of the virtual image is completed through voice interaction, and the time cost spent by the player in the virtual image adjustment process can be reduced.

Based on the same application concept, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for adjusting an avatar provided in each embodiment are performed.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, or the like, and when the computer program on the storage medium is executed, the above-mentioned avatar adjustment method can be executed, and the avatar adjustment is completed through voice interaction, so that the time cost spent by the player in the avatar adjustment process can be reduced.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An avatar adjustment method, wherein a current avatar is displayed through a graphical user interface of a terminal device, the adjustment method comprising:

2. The adaptation method according to claim 1, characterized in that the target parameter tuning data corresponding to the first speech instruction is determined according to the following steps:

3. The adaptation method according to claim 2, wherein the machine learning model comprises an embedding layer, a convolutional layer, at least two attention layers, a fully connected layer, and an output layer; the inputting the first voice command into a trained machine learning model, and determining the target parameter adjusting data corresponding to the first voice command through the machine learning model includes:

4. The adjustment method according to claim 2, wherein the determining, by the machine learning model, the target parameter adjustment data corresponding to the first voice instruction comprises:

5. The adaptation method according to claim 2, characterized in that the machine learning model is trained according to the following steps:

6. The adjusting method according to claim 5, wherein for each sample voice information, the parameter data label corresponding to the sample voice information is determined according to the following steps:

converting the sample voice information into sample text information;

7. The adaptation method according to claim 1, characterized in that the first speech instruction comprises at least one of the following evaluation instructions:

a face evaluation instruction and an expression evaluation instruction.

8. The adaptation method according to claim 5, wherein after displaying the adapted target avatar on the graphical user interface, the adaptation method further comprises:

9. The adaptation method of claim 1, wherein the monitoring the first voice instruction comprises:

10. The adaptation method according to claim 1, characterized in that the target parameter tuning data corresponding to the first speech instruction is determined according to the following steps:

11. The adjusting method according to claim 10, wherein after displaying each set of the first candidate parameter adjusting data and/or the avatar effect map corresponding to each set of the first candidate parameter adjusting data on the graphical user interface, the adjusting method further comprises:

12. The adaptation method according to claim 11, wherein after adding all of the first candidate tune away data to a user personalized rejection list, the adaptation method further comprises:

13. The adaptation method according to claim 2, wherein after displaying the adapted target avatar on the graphical user interface, the adaptation method further comprises:

14. An avatar adjustment apparatus for displaying a current avatar through a graphic user interface of a terminal device, the apparatus comprising:

15. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operated, the machine-readable instructions being executable by the processor to perform the steps of the avatar adjustment method according to any one of claims 1 to 13.

16. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the steps of the avatar adjustment method according to any one of claims 1 to 13.