CN111324207A - Drawing display method and device and electronic equipment - Google Patents

Drawing display method and device and electronic equipment Download PDF

Info

Publication number
CN111324207A
CN111324207A CN202010131473.0A CN202010131473A CN111324207A CN 111324207 A CN111324207 A CN 111324207A CN 202010131473 A CN202010131473 A CN 202010131473A CN 111324207 A CN111324207 A CN 111324207A
Authority
CN
China
Prior art keywords
determining
vector
drawing element
information
audio information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010131473.0A
Other languages
Chinese (zh)
Inventor
薛瑞彬
梁天新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN202010131473.0A priority Critical patent/CN111324207A/en
Publication of CN111324207A publication Critical patent/CN111324207A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a drawing display method and device and electronic equipment, and relates to the technical field of display. Wherein, the method comprises the following steps: acquiring audio information; converting the audio information into text information; determining the type of the drawing element according to the text information; determining an emotion level parameter according to the audio information and the text information; determining element attribute information according to the emotion level parameters; and displaying the drawing element which is in line with the drawing element type and has the element attribute information at the target position of the screen. In the invention, the emotion level of the user can be determined by combining the audio information for expressing tone and tone of the voice and the text information for directly expressing emotion, the attribute information of the element which accords with the current emotion is determined according to the emotion level, and the drawing element which accords with the drawing element type and has the attribute information is displayed at the target position of the screen, and the drawing element can embody the true feeling and reality of the user, so that the drawing can accord with the current mood of the user and express the true feeling and reality of the user.

Description

Drawing display method and device and electronic equipment
Technical Field
The present invention relates to the field of display technologies, and in particular, to a method and an apparatus for displaying a painting, and an electronic device.
Background
Along with the improvement of living standard, people have higher and higher requirements on mental life, and more interactive intelligent products or software appear, such as intelligent children teaching products, intelligent calligraphy auxiliary training software and the like, so that the interactive experience feeling can be improved, and the mental requirements of people are met.
In actual life, many people still have the interactive experience demand of drawing, and the drawing level of most people is relatively limited, often can't express the true feeling reality sense of oneself through the drawing to can't obtain the drawing that accords with mood at present.
Disclosure of Invention
The invention provides a drawing display method and device and electronic equipment, which are used for meeting the interactive experience requirement of people for expressing emotion through drawing.
In order to solve the above problems, the present invention discloses a painting display method, comprising:
acquiring audio information;
converting the audio information into text information;
determining the type of the drawing element according to the text information;
determining an emotion level parameter according to the audio information and the text information;
determining element attribute information according to the emotion level parameters;
and displaying the drawing element which is in line with the drawing element type and has the element attribute information at a target position of a screen.
Optionally, the determining an emotion level parameter according to the audio information and the text information includes:
determining a morpheme feature vector of the text information, determining a first waveform feature vector of the audio information, and determining a second waveform feature vector of the audio information;
splicing the first waveform characteristic vector, the second waveform characteristic vector and the morpheme characteristic vector to obtain a first vector;
performing self-attention mechanism processing on the first vector to obtain a second vector;
and classifying the second vector to obtain an emotion grade parameter corresponding to the second vector.
Optionally, the determining the morpheme feature vector of the text information includes:
inputting the text information as an input parameter into a first embedding layer of a first preset model, and outputting a first intermediate vector;
inputting the first intermediate vector into a first coding layer of the first preset model, and outputting a second intermediate vector;
and inputting the second intermediate vector into a first pooling layer of the first preset model, and outputting the morpheme feature vector.
Optionally, the determining a first waveform feature vector of the audio information includes:
inputting the audio information as an input parameter into a second embedded layer of a second preset model, and outputting a third intermediate vector;
inputting the third intermediate vector into a second coding layer of the second preset model, and outputting a fourth intermediate vector;
inputting the fourth intermediate vector into a second pooling layer of the second preset model, and outputting the first waveform feature vector;
the determining a second waveform feature vector of the audio information comprises:
and inputting the audio information as an input parameter into a third preset model, and outputting the second waveform characteristic vector.
Optionally, the determining a type of a drawing element according to the text information includes:
identifying keywords in the text information;
determining semantic information corresponding to the keywords;
and inquiring from a preset drawing element database to obtain the drawing element type corresponding to the semantic information.
Optionally, the displaying, before the target position on the screen, the drawing element that conforms to the drawing element type and has the element attribute information, further includes:
when a determination instruction of a screen target position is received, determining a sight gaze point of eyes on a screen; determining the sight gaze point as a screen target position; alternatively, the first and second electrodes may be,
and determining the screen target position corresponding to the drawing element type according to the preset corresponding relation between the drawing element type and the screen position.
Optionally, after the drawing element that conforms to the drawing element type and has the element attribute information is displayed at the target position of the screen, the method further includes:
generating a movement instruction for the drawing element in response to a gaze time of an eye at the drawing element being greater than a first time threshold;
within a preset time threshold range, in response to the fact that the watching time of the sight line watching point after the eyes move on the screen is larger than a second time threshold, determining the sight line watching point on the moved screen as the moved screen target position of the drawing element;
and displaying the drawing element at the moved screen target position.
In order to solve the above problem, the present invention also discloses a painting display device, comprising:
the acquisition module is used for acquiring audio information;
the conversion module is used for converting the audio information into text information;
the first determining module is used for determining the type of the drawing element according to the text information;
the second determining module is used for determining the emotion grade parameter according to the audio information and the text information;
the third determining module is used for determining element attribute information according to the emotion level parameters;
and the first display module is used for displaying the drawing elements which accord with the drawing element types and have the element attribute information at the target position of the screen.
Optionally, the second determining module includes:
the first determining submodule is used for determining a morpheme characteristic vector of the text information, determining a first waveform characteristic vector of the audio information and determining a second waveform characteristic vector of the audio information;
the splicing submodule is used for splicing the first waveform characteristic vector, the second waveform characteristic vector and the morpheme characteristic vector to obtain a first vector;
the processing submodule is used for carrying out self-attention mechanism processing on the first vector to obtain a second vector;
and the classification submodule is used for classifying the second vector to obtain the emotion grade parameter corresponding to the second vector.
Optionally, the first determining sub-module includes:
the first output unit is used for inputting the text information as an input parameter into a first embedded layer of a first preset model and outputting a first intermediate vector;
a second output unit, configured to input the first intermediate vector into a first coding layer of the first preset model, and output a second intermediate vector;
and the third output unit is used for inputting the second intermediate vector into the first pooling layer of the first preset model and outputting the morpheme characteristic vector.
Optionally, the first determining sub-module includes:
the fourth output unit is used for inputting the audio information as an input parameter into a second embedded layer of a second preset model and outputting a third intermediate vector;
a fifth output unit, configured to input the third intermediate vector into the second coding layer of the second preset model, and output a fourth intermediate vector;
a sixth output unit, configured to input the fourth intermediate vector into a second pooling layer of the second preset model, and output the first waveform feature vector;
the first determination submodule includes:
and the seventh output unit is used for inputting the audio information as an input parameter into a third preset model and outputting the second waveform characteristic vector.
Optionally, the first determining module includes:
the recognition sub-module is used for recognizing the keywords in the text information;
the second determining submodule is used for determining semantic information corresponding to the keyword;
and the query submodule is used for querying a preset drawing element database to obtain the drawing element type corresponding to the semantic information.
Optionally, the apparatus further comprises:
the fourth determining module is used for determining a sight gaze point of eyes on the screen when receiving a determining instruction of a screen target position; determining the sight gaze point as a screen target position; alternatively, the first and second electrodes may be,
and the fifth determining module is used for determining the screen target position corresponding to the drawing element type according to the preset corresponding relation between the drawing element type and the screen position.
Optionally, the apparatus further comprises:
a generating module, configured to generate a movement instruction of the drawing element in response to a gaze time of an eye at the drawing element being greater than a first time threshold;
a sixth determining module, configured to determine, within a preset time threshold range, that a gaze point on the screen after the movement of the eye on the screen is a moved screen target position of the drawing element in response to a gaze time of a gaze point after the movement of the eye on the screen being greater than a second time threshold;
and the second display module is used for displaying the drawing element at the moved screen target position.
In order to solve the above problem, the present invention also discloses an electronic device, which includes a processor, a memory, and a computer program stored on the memory and operable on the processor, wherein the computer program, when executed by the processor, implements the steps of the drawing display method as described above.
In order to solve the above problem, the present invention also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the drawing display method as described above.
Compared with the prior art, the invention has the following advantages:
in the embodiment of the invention, the electronic equipment can acquire the audio information, convert the audio information into the text information, determine the type of the drawing element according to the text information, determine the emotion level parameter according to the audio information and the text information, determine the element attribute information according to the emotion level parameter, and further display the drawing element which is in line with the type of the drawing element and has the element attribute information at the target position of the screen. In the embodiment of the invention, the electronic equipment can combine the audio information capable of expressing tone of voice and tone and the text information capable of directly expressing emotion to determine the emotion level of the user, and can determine the attribute information of the drawing element which accords with the current emotion according to the emotion level, so that the drawing element which accords with the type of the drawing element and has the attribute information can be displayed at the screen target screen position, and the drawing element can embody the true emotion and reality of the user, so that the drawing can accord with the current mood of the user and express the true emotion and reality of the user.
Drawings
FIG. 1 is a flow chart of a pictorial display method according to a first embodiment of the present invention;
FIG. 2 is a block diagram of an intelligent drawing system according to a first embodiment of the invention;
FIG. 3 is a schematic diagram illustrating a model architecture for determining an emotion level parameter according to a first embodiment of the present invention;
FIG. 4 is a flowchart illustrating an eye tracking positioning method according to a first embodiment of the present invention;
FIG. 5 is a flow chart of a first embodiment of the present invention;
fig. 6 is a block diagram showing a configuration of a painting display apparatus according to a second embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example one
Referring to fig. 1, a flowchart illustrating steps of a painting display method according to a first embodiment of the present invention is shown, where the method includes the following steps:
step 101: audio information is obtained.
In an embodiment of the present invention, optionally, referring to fig. 2, an intelligent painting system for implementing a painting display method may be provided, where the system mainly uses a painting screen as a mobile end supporting device, and the system mainly includes the following functional modules: the system comprises a speech emotion recognition module 10, a human eye tracking module 20, a semantic understanding module 30, a cloud data processing module 40 and an interactive display module 50.
The speech emotion recognition module can be used for recognizing the speech of the user, converting the speech into text, and recognizing the emotion of the user at the moment through the speech, such as the emotion that the user is happy, sad, angry and the like at the moment. The semantic understanding module can be used for realizing the understanding of the user language, namely, the semantic of the user can be understood through keyword recognition according to the voice text of the user. The eye tracking module can be used for tracking the fixation point of the eye sight on the picture screen in real time. The cloud data processing module can be used for storing voice data of users, training semantic models according to semantic understanding conditions, directionally processing frequently-used users, regularly synchronizing the users to the application server, storing user personalized painting elements, conveniently establishing a user personal personalized painting library in the future, and simultaneously storing user painting element preference categories so that a background enriches the systematic painting element library according to a large amount of user data. The interactive display module can realize man-machine interaction during drawing, can display drawing elements input by a user, and integrates the whole drawing finished at last according to reasonable logic.
In the embodiment of the invention, when a user wants to draw on the picture screen of the electronic equipment through voice, the user can express the drawing elements and the related description which the user wants to draw through voice, wherein the drawing elements can be sky, sea, trees, buildings and the like. For example, the user may express "sky blue". Correspondingly, the electronic equipment can acquire the voice of the user through a voice acquisition device such as a microphone and the like and generate an audio file, namely, the audio information of the user is obtained.
Step 102: and converting the audio information into text information.
In the embodiment of the present invention, the electronic device may convert the audio information into the text information by using an audio-to-text conversion algorithm, so as to convert the content that the user wants to express into the data in the text format.
Step 103: and determining the type of the drawing element according to the text information.
In this embodiment of the present invention, optionally, the electronic device may implement this step by the following method, specifically including: identifying keywords in the text information; determining semantic information corresponding to the keywords; and inquiring from a preset drawing element database to obtain the drawing element type corresponding to the semantic information.
The keywords may represent the core content in the text message, for example, if the user expresses "blue sky", the keywords may be "blue" and "sky". In practical application, in order to enable the result of the keyword recognition to be more beneficial to drawing, the extraction of relevant drawing information such as color, size, scenery and the like can be emphasized when the keyword recognition model is trained, so that more information irrelevant to drawing can be filtered when the keyword is recognized. Furthermore, the electronic device may determine semantic information corresponding to the keyword or the keyword combination through a semantic model, for example, "blue" and "sky" may represent a sky scene, and may be used as a painting element. The electronic device may then query a preset drawing element database to find the drawing element type of the sky scene.
In a specific application, a painting element type may be a combination of painting elements, where the combination may include a plurality of painting sub-elements with different attributes, for example, a sky painting element type may include painting elements such as a blue sky, a sunset red sky, a dark sky, and the like, and each painting element may be further grouped in a refined manner, for example, a blue sky painting element may also be specifically divided into painting sub-elements such as a light blue clear sky, a sky blue with a cloud sky, and a deep blue sky with a cloud sky, and this is not particularly limited in the embodiment of the present invention.
Step 104: and determining an emotion grade parameter according to the audio information and the text information.
In the embodiment of the present invention, the electronic device may determine, through the description of the user language (which may be embodied by text information) and the tone and intonation of the user description (which may be embodied by a waveform of audio information), an emotion that is given to the drawing element by the user at this time, or an emotion that the user wants to express through the drawing element.
For example, if the user uses a word like "beautiful" to describe a painting element, indicating that the user describes something that makes him happy, then the mood of the user should be happy, while if the user says a word like "ugly," the mood of the user may be sad. However, sometimes, the emotion of the user is complex, and may be a combination of multiple emotions, and the emotion of the user may not be accurately judged from the language of the user alone, but different degrees of emphasis on words and expressions such as tone and intonation may be displayed, that is, the emotion level of the user, so that the emotion level parameter may be determined in combination with the audio information and the text information of the user, so as to find out the element attribute information conforming to the emotion of the user according to the emotion level parameter, draw a drawing element having the element attribute information, and accurately express the mood of the user.
Optionally, the step may be specifically implemented by the following steps: determining a morpheme feature vector of the text information, determining a first waveform feature vector of the audio information, and determining a second waveform feature vector of the audio information; splicing the first waveform characteristic vector, the second waveform characteristic vector and the morpheme characteristic vector to obtain a first vector; performing self-attention mechanism processing on the first vector to obtain a second vector; and classifying the second vector to obtain an emotion grade parameter corresponding to the second vector.
The first vector is obtained by splicing the first waveform characteristic vector, the second waveform characteristic vector and the morpheme characteristic vector, the second vector is obtained by processing the first vector through a self-attention mechanism, and the self-attention mechanism is a common mechanism in natural language processing and can be used for concerning the correlation among the inner parts of the vectors, so that the classification result can be more accurate. The morpheme feature vector is extracted from text information corresponding to user voice, the most direct emotion description of a user on a drawing element can be represented, the first waveform feature vector and the second waveform feature vector are extracted from audio information of the user, the emotion description of the user on the drawing element through tone, intonation and the like can be represented, and the emotion grade of the user can be analyzed from the three latitudes.
Fig. 3 shows a schematic diagram of a model architecture for determining an emotion level parameter, and referring to fig. 3, the model architecture may include a first preset model 01, a second preset model 02, a third preset model 03, a splice layer 04, a self-attention layer 05, and a full link layer 06. The first preset model can be used for extracting morpheme feature vectors from text information, the second preset model can be used for extracting first waveform feature vectors from audio information, and the third preset model can be used for extracting second waveform feature vectors from audio information.
The first preset model 01 further specifically includes a first embedding layer 011, a first coding layer 012, and a first pooling layer 013, and accordingly, the step of determining the morpheme feature vector of the text information may be specifically implemented in the following manner, including: inputting the text information as an input parameter into a first embedding layer of a first preset model, and outputting a first intermediate vector; inputting the first intermediate vector into a first coding layer of the first preset model, and outputting a second intermediate vector; and inputting the second intermediate vector into a first pooling layer of the first preset model, and outputting the morpheme feature vector.
The text information can be used as an input parameter of a first preset model, a first embedding (embedding) layer of the first preset model is input, so that the text information can be converted into a vector to be represented, operation and processing are performed subsequently, the first embedding layer can output a first intermediate vector to a first coding layer of the first preset model, the first intermediate vector is coded through the first coding layer, a second intermediate vector obtained through coding is output, and latitude reduction can be performed on the second intermediate vector through a first pooling layer, so that the calculated amount is reduced, and morpheme characteristic vectors obtained through pooling are output.
The second preset model 02 further specifically includes a second embedding layer 021, a second coding layer 022, and a second pooling layer 023, and accordingly, the step of determining the first waveform feature vector of the audio information may be specifically implemented in the following manner, including: inputting the audio information as an input parameter into a second embedded layer of a second preset model, and outputting a third intermediate vector; inputting the third intermediate vector into a second coding layer of the second preset model, and outputting a fourth intermediate vector; and inputting the fourth intermediate vector into a second pooling layer of the second preset model, and outputting the first waveform feature vector.
The audio information can be used as an input parameter of a second preset model, and a second embedding (embedding) layer of the second preset model is input, so that the audio information can be converted into a vector to be represented, and calculation and processing are performed subsequently, wherein before the audio information is input into the second embedding layer, the audio information can be preprocessed, namely, the audio information can be segmented into a plurality of audio slices according to preset duration, and then the audio information can be input into the second embedding layer in the form of the audio slices. The second embedding layer may output a third intermediate vector to a second coding layer of a second preset model, so as to encode the third intermediate vector through the second coding layer, then output a fourth intermediate vector obtained by encoding, and then perform latitude reduction on the fourth intermediate vector through a second pooling layer, so as to reduce the amount of calculation, and output the pooled first waveform feature vector.
Optionally, the first coding layer and the second coding layer may both be GRU (Gated current Unit) coding layers.
The step of determining the second waveform feature vector of the audio information may be specifically implemented by: and inputting the audio information as an input parameter into a third preset model, and outputting the second waveform characteristic vector.
Wherein, optionally, the third preset model comprises a VGG-100 model. The audio information can be used as an input parameter of the VGG-100 model and input into the VGG-100 model, so that the VGG-100 model can output the second waveform feature vector, and the VGG-100 model can refer to the related art, which is not described herein again in the embodiments of the present invention.
After the morpheme feature vector, the first waveform feature vector and the second waveform feature vector are obtained, the morpheme feature vector, the first waveform feature vector and the second waveform feature vector can be spliced through a splicing layer according to a preset sequence to obtain the first vector. Then, the Self-Attention (Self-Attention) mechanism processing can be performed on the first vector through the Self-Attention layer to obtain a second vector. The self-attention mechanism can focus on the correlation between the morpheme characteristic vector, the first waveform characteristic vector and the second waveform characteristic vector and other two characteristic vectors respectively, so that the classification result can be more accurate. And then, classifying the second vector through the full link layer, and outputting the emotion level parameter corresponding to the second vector. Alternatively, the full link layer may specifically be a full link layer based on the softmax function.
Step 105: and determining element attribute information according to the emotion level parameters.
In the embodiment of the invention, the electronic equipment can determine the element attribute information of the drawing element according to the output emotion level parameter.
Optionally, the element attribute information includes at least one of a color, a thickness, a length, a width, an area, and an outline of the drawing element.
For example, if the user expresses "blue sky", and the emotion level parameter of the user is considered to correspond to a particularly happy emotion by emotion level recognition, the blue depth and area size of the sky can be displayed, for example, the blue depth of the sky is blue, and the area of the sky is about 1/3 of the screen. The semantic understanding of the user can be carried out, and the semantic of very blue sky can be obtained. And then finding an element of sky blue in a preset drawing element database, and displaying the element at a proper position in an interactive display system, thereby completing the positioning of a drawing element.
For another example, the user expresses "thick snow", and through emotion level recognition, if the user expresses "thick snow" in different tones, the user may be presented with snow of different thicknesses.
As another example, if the user expresses "extensive prairies," a drawing element database of "big prairies" may be displayed at a corresponding location in the interactive display system, and if the user particularly emphasizes "extensive," a drawing element database of "big prairies" may be presented that is larger in area.
For another example, the user may also express "the vicious old people" and emphasize "the vicious" through emotion level recognition, so that the old people with different vicissitudes, such as the vicious old people with a relatively long beard or a shirt \35124, the old people with a very long beard or a shirt \35124, the blogs, the old people with a very long beard and the old people who need to hold a dining bowl with hands, etc., are shown.
In summary, the user can adjust the color, thickness, length, area of the drawing elements, the appearance or shape characteristics of people or objects, and the like through description with emotion and tone of language. Through emotion level identification, the user input drawing elements can be enriched, experience is improved, and the user is attracted to express own emotion through interactive drawing to the greatest extent.
It should be noted that, in a specific application, the drawing element database, the emotion level parameters, and the like may all be displayed so that the user can confirm or correct the drawing, and the final drawing can accurately express the true feeling and reality of the user.
Step 106: and displaying the drawing element which is in line with the drawing element type and has the element attribute information at a target position of a screen.
In this embodiment of the present invention, before this step, the electronic device may further perform the following steps, including: when a determination instruction of a screen target position is received, determining a sight gaze point of eyes on a screen; determining the sight gaze point as a screen target position; or determining the screen target position corresponding to the drawing element type according to the preset corresponding relation between the drawing element type and the screen position.
The electronic equipment can determine the display position of the painting element which is in line with the type of the painting element and has the element attribute information, namely the placement position of the painting element in the picture.
In an optional implementation mode, a user can place drawing elements at the positions of the fixation points of the eyes on the screen, so that the user-defined drawing of the user can be realized, and the interestingness of drawing is increased.
Specifically, the electronic device can assist the user in positioning the positions of the drawing elements through eye tracking, and when the user speaks a certain word, such as "blue-blue sky", the drawing elements of the sky appear at the moment, and prompt the user to place the drawing elements, so that the user can place the drawing elements through the sight line. Referring to fig. 4, as shown in fig. 4, firstly, a camera is used for detecting a face of a user, and if the face is detected, key points of the face are extracted, so that the positions of human eyes are located, and characteristic points of the human eyes are extracted. And meanwhile, the head gestures of the user are detected, and when the head gestures of the user are different, the reference systems capable of positioning the fixation points are different, namely, sight tracking models corresponding to different reference systems can be adopted. And obtaining the current gazing of the user through a sight tracking model corresponding to the current head posture of the user by utilizing the detected human eye characteristic point information. It should be noted that, in order to distinguish the gaze point used by the user to locate the display position from the gaze point subconsciously seen by the user, the user may be prompted to keep gazing for a certain period of time when locating the display position. After the gaze point of the user is obtained, the painting elements to be placed can be placed at the gaze position of the user while the current painting elements are prompted to be placed, namely, the center points of the painting elements are placed on the gaze point of the user.
In another alternative implementation, the display position of the drawing element may be determined according to the characteristics of the drawing element itself, for example, the "sky" drawing element may be preset to be displayed at the top of the drawing, and the "sea" drawing element may be preset to be displayed at the bottom of the drawing, that is, the electronic device may default to a display position according to the characteristics of the drawing element itself. Of course, when the default display position does not meet the user requirement, the user can also adjust the display position according to the self requirement.
Essentially, the drawing process provided by the embodiment of the invention is similar to the drawing idea of "you say i draw".
Optionally, after this step, the electronic device may further perform the following steps, including: generating a movement instruction for the drawing element in response to a gaze time of an eye at the drawing element being greater than a first time threshold; within a preset time threshold range, in response to the fact that the watching time of the sight line watching point after the eyes move on the screen is larger than a second time threshold, determining the sight line watching point on the moved screen as the moved screen target position of the drawing element; and displaying the drawing element at the moved screen target position.
Specifically, if the user wants to change the display position of a certain drawing element, the user can watch (the watching time is greater than the first time threshold value) the certain drawing element for a long time in the editing state of the drawing element, and the drawing element enters a movable state. In addition, only one drawing element can be moved at the same time on the whole picture, so that other drawing elements are not influenced.
Alternatively, the user may also enter the display position edit state of the XX drawing element by entering a state of changing the position of the drawing element by voice, that is, the user says "change XX drawing element position".
After the user places all the drawing elements, the electronic equipment can also perform fusion layout of the drawing elements, so that all the drawing elements in the drawing can be reasonably arranged to conform to natural logics, such as the state of a river, the state of a person and the like with the sky above and the earth below, the elements can be reasonably integrated through the attributes of the elements, and the boundary fusion between the elements also conforms to the natural logics, so that the drawing has a richer natural effect.
In addition, if the user cannot find the desired drawing element, the electronic device can prompt the user to have no drawing element in a voice mode or the like and please re-input or customize the drawing element. After the user defines a drawing element by self, the electronic equipment can store the drawing element into the corresponding drawing element type in the preset drawing element database, so that the preset drawing element database is updated. Optionally, in practical applications, the preset drawing element database may be stored in the cloud data processing module.
The main flow of the painting display can refer to fig. 5, and the embodiment of the present invention is not described again.
The embodiment of the invention can draw pictures through voice control and can position the drawing elements through human eyes, thereby increasing the interestingness of drawing of users. In addition, through the analysis of the three-latitude characteristic vectors of the waveforms and morphemes in the voice of the user, the emotion level of the user can be obtained, so that the user can express the real emotion and real emotion of the user through the drawing, the drawing which accords with the current mood is obtained, and the requirement that the user expresses the emotion through the drawing is met.
In the embodiment of the invention, the electronic equipment can acquire the audio information, convert the audio information into the text information, determine the type of the drawing element according to the text information, determine the emotion level parameter according to the audio information and the text information, determine the element attribute information according to the emotion level parameter, and further display the drawing element which is in line with the type of the drawing element and has the element attribute information at the target position of the screen. In the embodiment of the invention, the electronic equipment can combine the audio information capable of expressing tone of voice and tone and the text information capable of directly expressing emotion to determine the emotion level of the user, and can determine the attribute information of the drawing element which accords with the current emotion according to the emotion level, so that the drawing element which accords with the type of the drawing element and has the attribute information can be displayed at the target position of the screen, and the drawing element can embody the true emotion and reality of the user, thus, the drawing can accord with the current mood of the user and express the true emotion and reality of the user.
Example two
Referring to fig. 6, a block diagram of a painting display apparatus according to a second embodiment of the present invention is shown, wherein the apparatus 600 includes:
an obtaining module 601, configured to obtain audio information;
a conversion module 602, configured to convert the audio information into text information;
a first determining module 603, configured to determine a type of the drawing element according to the text information;
a second determining module 604, configured to determine an emotion level parameter according to the audio information and the text information;
a third determining module 605, configured to determine element attribute information according to the emotion level parameter;
and a first display module 606, configured to display the drawing element that meets the drawing element type and has the element attribute information at a screen target position.
Optionally, the second determining module includes:
the first determining submodule is used for determining a morpheme characteristic vector of the text information, determining a first waveform characteristic vector of the audio information and determining a second waveform characteristic vector of the audio information;
the splicing submodule is used for splicing the first waveform characteristic vector, the second waveform characteristic vector and the morpheme characteristic vector to obtain a first vector;
the processing submodule is used for carrying out self-attention mechanism processing on the first vector to obtain a second vector;
and the classification submodule is used for classifying the second vector to obtain the emotion grade parameter corresponding to the second vector.
Optionally, the first determining sub-module includes:
the first output unit is used for inputting the text information as an input parameter into a first embedded layer of a first preset model and outputting a first intermediate vector;
a second output unit, configured to input the first intermediate vector into a first coding layer of the first preset model, and output a second intermediate vector;
and the third output unit is used for inputting the second intermediate vector into the first pooling layer of the first preset model and outputting the morpheme characteristic vector.
Optionally, the first determining sub-module includes:
the fourth output unit is used for inputting the audio information as an input parameter into a second embedded layer of a second preset model and outputting a third intermediate vector;
a fifth output unit, configured to input the third intermediate vector into the second coding layer of the second preset model, and output a fourth intermediate vector;
a sixth output unit, configured to input the fourth intermediate vector into a second pooling layer of the second preset model, and output the first waveform feature vector;
the first determination submodule includes:
and the seventh output unit is used for inputting the audio information as an input parameter into a third preset model and outputting the second waveform characteristic vector.
Optionally, the first determining module includes:
the recognition sub-module is used for recognizing the keywords in the text information;
the second determining submodule is used for determining semantic information corresponding to the keyword;
and the query submodule is used for querying a preset drawing element database to obtain the drawing element type corresponding to the semantic information.
Optionally, the apparatus further comprises:
the fourth determining module is used for determining a sight gaze point of eyes on the screen when receiving a determining instruction of a screen target position; determining the sight gaze point as a screen target position; alternatively, the first and second electrodes may be,
and the fifth determining module is used for determining the screen target position corresponding to the drawing element type according to the preset corresponding relation between the drawing element type and the screen position.
Optionally, the apparatus further comprises:
a generating module, configured to generate a movement instruction of the drawing element in response to a gaze time of an eye at the drawing element being greater than a first time threshold;
a sixth determining module, configured to determine, within a preset time threshold range, that a gaze point on the screen after the movement of the eye on the screen is a moved screen target position of the drawing element in response to a gaze time of a gaze point after the movement of the eye on the screen being greater than a second time threshold;
and the second display module is used for displaying the drawing element at the moved screen target position.
In the embodiment of the invention, the electronic device can acquire the audio information through the acquisition module, convert the audio information into the text information through the conversion module, determine the type of the drawing element according to the text information through the first determination module, determine the emotion level parameter according to the audio information and the text information through the second determination module, determine the attribute information of the element according to the emotion level parameter through the third determination module, and further display the drawing element which is in line with the type of the drawing element and has the attribute information of the element at the target position of the screen through the first display module. In the embodiment of the invention, the electronic equipment can combine the audio information capable of expressing tone of voice and tone and the text information capable of directly expressing emotion to determine the emotion level of the user, and can determine the attribute information of the drawing element which accords with the current emotion according to the emotion level, so that the drawing element which accords with the type of the drawing element and has the attribute information can be displayed at the target position of the screen, and the drawing element can embody the true emotion and reality of the user, thus, the drawing can accord with the current mood of the user and express the true emotion and reality of the user.
The embodiment of the invention also discloses an electronic device, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein when the computer program is executed by the processor, the steps of the drawing display method are realized.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the painting display method.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The drawing display method, the drawing display device and the electronic equipment provided by the invention are described in detail, specific examples are applied in the description to explain the principle and the implementation mode of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of pictorial display, comprising:
acquiring audio information;
converting the audio information into text information;
determining the type of the drawing element according to the text information;
determining an emotion level parameter according to the audio information and the text information;
determining element attribute information according to the emotion level parameters;
and displaying the drawing element which is in line with the drawing element type and has the element attribute information at a target position of a screen.
2. The method of claim 1, wherein determining an emotion level parameter from the audio information and the text information comprises:
determining a morpheme feature vector of the text information, determining a first waveform feature vector of the audio information, and determining a second waveform feature vector of the audio information;
splicing the first waveform characteristic vector, the second waveform characteristic vector and the morpheme characteristic vector to obtain a first vector;
performing self-attention mechanism processing on the first vector to obtain a second vector;
and classifying the second vector to obtain an emotion grade parameter corresponding to the second vector.
3. The method of claim 2, wherein the determining the morpheme feature vector of the text message comprises:
inputting the text information as an input parameter into a first embedding layer of a first preset model, and outputting a first intermediate vector;
inputting the first intermediate vector into a first coding layer of the first preset model, and outputting a second intermediate vector;
and inputting the second intermediate vector into a first pooling layer of the first preset model, and outputting the morpheme feature vector.
4. The method of claim 2, wherein determining the first waveform feature vector of the audio information comprises:
inputting the audio information as an input parameter into a second embedded layer of a second preset model, and outputting a third intermediate vector;
inputting the third intermediate vector into a second coding layer of the second preset model, and outputting a fourth intermediate vector;
inputting the fourth intermediate vector into a second pooling layer of the second preset model, and outputting the first waveform feature vector;
the determining a second waveform feature vector of the audio information comprises:
and inputting the audio information as an input parameter into a third preset model, and outputting the second waveform characteristic vector.
5. The method according to any one of claims 1-4, wherein said determining a drawing element type based on said textual information comprises:
identifying keywords in the text information;
determining semantic information corresponding to the keywords;
and inquiring from a preset drawing element database to obtain the drawing element type corresponding to the semantic information.
6. The method according to any one of claims 1-4, wherein the displaying the drawing element conforming to the drawing element type and having the element attribute information in front of the screen target position further comprises:
when a determination instruction of a screen target position is received, determining a sight gaze point of eyes on a screen; determining the sight gaze point as a screen target position; alternatively, the first and second electrodes may be,
and determining the screen target position corresponding to the drawing element type according to the preset corresponding relation between the drawing element type and the screen position.
7. The method of claim 6, wherein displaying the drawing element that conforms to the drawing element type and has the element attribute information after the screen target location, further comprises:
generating a movement instruction for the drawing element in response to a gaze time of an eye at the drawing element being greater than a first time threshold;
within a preset time threshold range, in response to the fact that the watching time of the sight line watching point after the eyes move on the screen is larger than a second time threshold, determining the sight line watching point on the moved screen as the moved screen target position of the drawing element;
and displaying the drawing element at the moved screen target position.
8. A pictorial display device, comprising:
the acquisition module is used for acquiring audio information;
the conversion module is used for converting the audio information into text information;
the first determining module is used for determining the type of the drawing element according to the text information;
the second determining module is used for determining the emotion grade parameter according to the audio information and the text information;
the third determining module is used for determining element attribute information according to the emotion level parameters;
and the first display module is used for displaying the drawing elements which accord with the drawing element types and have the element attribute information at the target position of the screen.
9. An electronic device, characterized in that it comprises a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the pictorial display method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the pictorial display method according to any one of claims 1 to 7.
CN202010131473.0A 2020-02-28 2020-02-28 Drawing display method and device and electronic equipment Pending CN111324207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010131473.0A CN111324207A (en) 2020-02-28 2020-02-28 Drawing display method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010131473.0A CN111324207A (en) 2020-02-28 2020-02-28 Drawing display method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111324207A true CN111324207A (en) 2020-06-23

Family

ID=71169017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010131473.0A Pending CN111324207A (en) 2020-02-28 2020-02-28 Drawing display method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111324207A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793398A (en) * 2020-07-24 2021-12-14 北京京东尚科信息技术有限公司 Drawing method and device based on voice interaction, storage medium and electronic equipment
CN114115528A (en) * 2021-11-02 2022-03-01 深圳市雷鸟网络传媒有限公司 Virtual object control method and device, computer equipment and storage medium
CN114489897A (en) * 2022-01-21 2022-05-13 北京字跳网络技术有限公司 Object processing method, device, terminal equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170052946A1 (en) * 2014-06-06 2017-02-23 Siyu Gu Semantic understanding based emoji input method and device
CN108305642A (en) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 The determination method and apparatus of emotion information
CN110019893A (en) * 2018-08-28 2019-07-16 京东方科技集团股份有限公司 A kind of paintings acquisition methods and device
CN110379430A (en) * 2019-07-26 2019-10-25 腾讯科技(深圳)有限公司 Voice-based cartoon display method, device, computer equipment and storage medium
CN110675859A (en) * 2019-09-05 2020-01-10 华南理工大学 Multi-emotion recognition method, system, medium, and apparatus combining speech and text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170052946A1 (en) * 2014-06-06 2017-02-23 Siyu Gu Semantic understanding based emoji input method and device
CN108305642A (en) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 The determination method and apparatus of emotion information
CN110019893A (en) * 2018-08-28 2019-07-16 京东方科技集团股份有限公司 A kind of paintings acquisition methods and device
CN110379430A (en) * 2019-07-26 2019-10-25 腾讯科技(深圳)有限公司 Voice-based cartoon display method, device, computer equipment and storage medium
CN110675859A (en) * 2019-09-05 2020-01-10 华南理工大学 Multi-emotion recognition method, system, medium, and apparatus combining speech and text

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793398A (en) * 2020-07-24 2021-12-14 北京京东尚科信息技术有限公司 Drawing method and device based on voice interaction, storage medium and electronic equipment
CN114115528A (en) * 2021-11-02 2022-03-01 深圳市雷鸟网络传媒有限公司 Virtual object control method and device, computer equipment and storage medium
CN114115528B (en) * 2021-11-02 2024-01-19 深圳市雷鸟网络传媒有限公司 Virtual object control method, device, computer equipment and storage medium
CN114489897A (en) * 2022-01-21 2022-05-13 北京字跳网络技术有限公司 Object processing method, device, terminal equipment and medium
CN114489897B (en) * 2022-01-21 2023-08-08 北京字跳网络技术有限公司 Object processing method, device, terminal equipment and medium

Similar Documents

Publication Publication Date Title
US20210081056A1 (en) Vpa with integrated object recognition and facial expression recognition
CN111459290B (en) Interactive intention determining method and device, computer equipment and storage medium
US20230316643A1 (en) Virtual role-based multimodal interaction method, apparatus and system, storage medium, and terminal
CN110211563B (en) Chinese speech synthesis method, device and storage medium for scenes and emotion
CN111226224B (en) Method for translating voice signals and electronic equipment
CN110301117B (en) Method and apparatus for providing response in session
KR20210095208A (en) Video caption creation method, device and apparatus, and storage medium
US11347801B2 (en) Multi-modal interaction between users, automated assistants, and other computing services
CN111324207A (en) Drawing display method and device and electronic equipment
CN114401438B (en) Video generation method and device for virtual digital person, storage medium and terminal
US20120276504A1 (en) Talking Teacher Visualization for Language Learning
WO2017112813A1 (en) Multi-lingual virtual personal assistant
CN107403011B (en) Virtual reality environment language learning implementation method and automatic recording control method
CN115082602B (en) Method for generating digital person, training method, training device, training equipment and training medium for model
CN107480766B (en) Method and system for content generation for multi-modal virtual robots
CN113835522A (en) Sign language video generation, translation and customer service method, device and readable medium
US9525841B2 (en) Imaging device for associating image data with shooting condition information
CN115329779A (en) Multi-person conversation emotion recognition method
US11176943B2 (en) Voice recognition device, voice recognition method, and computer program product
US20210407504A1 (en) Generation and operation of artificial intelligence based conversation systems
CN116737883A (en) Man-machine interaction method, device, equipment and storage medium
KR20210037857A (en) Realistic AI-based voice assistant system using relationship setting
JP2020077272A (en) Conversation system and conversation program
CN115461709A (en) Hierarchical context-specific actions from ambient speech
Hempel Usability of speech dialog systems: listening to the target audience

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination