CN111445545A

CN111445545A - Text-to-map method, device, storage medium and electronic equipment

Info

Publication number: CN111445545A
Application number: CN202010124986.9A
Authority: CN
Inventors: 谢文珍; 黄恺; 冯富森
Original assignee: Beijing Dami Future Technology Co ltd
Current assignee: Beijing Dami Future Technology Co ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2020-07-24
Anticipated expiration: 2040-02-27
Also published as: CN111445545B

Abstract

The embodiment of the application discloses a method, a device, a storage medium and electronic equipment for transferring a text to a paste picture, wherein the method comprises the following steps: the method comprises the steps of obtaining text features of a target text, drawing a scene canvas corresponding to the text features, determining a target object to be drawn based on the text features and the scene canvas, determining attribute features of the target object according to the target object, the text features and the scene canvas, drawing the target object on the scene canvas, adjusting the target object based on the attribute features, and generating a chartlet corresponding to the target text. By adopting the method and the device, the contact degree of the mapping and the actual description scene of the text can be improved, and the accuracy of generating the mapping is improved.

Description

Text-to-map method, device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for transferring a text to a map, a storage medium, and an electronic device.

Background

The text transfer map is an application of a text generation image technology, and can convert texts into pictures with vivid images, for example, in online education, teaching texts (such as texts written by students or texts in language conversation) can be converted into pictures, so that the interest of language learning of students is stimulated, and the online teaching effect is improved.

At present, in a text mapping process, text encoding is usually performed on a text, mapping objects are indexed in a mapping library according to encoded keywords (such as nouns in the text), and the mapping objects are combined on a canvas to generate a mapping.

Disclosure of Invention

The embodiment of the application provides a method and a device for converting a text into a chartlet, a storage medium and electronic equipment, which can improve the contact degree between the chartlet and a text actual description scene and improve the accuracy of chartlet generation. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a text re-pasting method, where the method includes:

acquiring text characteristics of a target text, and drawing a scene canvas corresponding to the text characteristics;

determining a target object to be drawn based on the text features and the scene canvas;

determining attribute characteristics of the target object according to the target object, the text characteristics and the scene canvas;

and drawing the target object on the scene canvas, adjusting the target object based on the attribute characteristics, and generating a chartlet corresponding to the target text.

In a second aspect, an embodiment of the present application provides a text chartlet apparatus, where the apparatus includes:

the scene canvas drawing module is used for acquiring text characteristics of a target text and drawing a scene canvas corresponding to the text characteristics;

the target object determination module is used for determining a target object to be drawn based on the text features and the scene canvas;

the attribute characteristic determining module is used for determining the attribute characteristic of the target object according to the target object, the text characteristic and the scene canvas;

and the map generating module is used for drawing the target object on the scene canvas, adjusting the target object based on the attribute characteristics and generating a map corresponding to the target text.

In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

In a fourth aspect, an embodiment of the present application provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:

in one or more embodiments of the application, a terminal obtains text features of a target text, draws a scene canvas corresponding to the text features, determines a target object to be drawn based on the text features and the scene canvas, determines attribute features of the target object according to the target object, the text features and the scene canvas, draws the target object on the scene canvas, adjusts the target object based on the attribute features, and generates a chartlet corresponding to the target text. The scene canvas, the object to be drawn and the attribute characteristics of the object corresponding to the target text are determined step by step through the text characteristics of the target text, the image is correspondingly adjusted based on the attribute characteristics (position, action, posture and the like) of the object when the object is drawn by the scene canvas, the problem that the generated chartlet is difficult to conform to the actual description scene of the text according to the key words can be avoided, a chartlet with accurate scene, clear object and clear attribute can be generated, the chartlet can be closer to the semantics expressed by the actual text, the conformity between the chartlet and the actual description scene of the text can be improved, and the accuracy of generating the chartlet is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a text re-pasting method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an attribute determination model related to a text-to-paste method provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of another attribute determination model related to a text-to-paste method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating another method for text re-pasting according to an embodiment of the present application;

5-10 are schematic diagrams illustrating examples of conversion maps related to a text mapping method provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a text chartlet transferring device according to an embodiment of the present disclosure;

FIG. 12 is a schematic structural diagram of a scene canvas rendering module according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a target object determination module according to an embodiment of the present application;

FIG. 14 is a schematic structural diagram of an attribute feature determination module provided in an embodiment of the present application;

fig. 15 is a schematic structural diagram of a map generation module according to an embodiment of the present application;

FIG. 16 is a schematic structural diagram of another text chartlet transferring device provided in the embodiments of the present application;

fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the present application, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it is noted that, unless explicitly stated or limited otherwise, "including" and "having" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art. Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The present application will be described in detail with reference to specific examples.

In one embodiment, as shown in fig. 1, a text transfer method is proposed, which may be implemented by means of a computer program and may run on a text transfer device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.

Specifically, the text transfer method comprises the following steps:

step 101: and acquiring text characteristics of the target text, and drawing a scene canvas corresponding to the text characteristics.

The target text refers to text containing text content, e.g., the target text may be text in an online education scene that captures the user's writing. The textual content contained in the target text can generally be understood as a written representation of a language, usually a sentence or a combination of sentences having a complete, systematic meaning. The target text is exemplified by an english language, and may be at least one word, at least one sentence, and at least one paragraph, and the target text may be an actual operation form of the language, and generally refers to some language words in a specific implementation.

The text features refer to text attributes specific to unstructured data expressed by characters, and take a written composition as an example, the text features comprise text elements such as author creation intentions, data, theme descriptions, underlying feature meanings and the like. The semantic meaning of the object to be described and various characteristics of the semantic meaning of the object to be described in the language environment can be expressed through the text characteristics. The text content contained in the target text is taken as an English short text example, and the text characteristics can be composed letters, word sequence, word emotion information, mutual information and the like.

The composing letters are those letters that a word is composed of, and the letters are in the sequence relation.

The word order is the sequential order of each word that expresses the composition of a sentence (meaning).

The emotional information of a word is the emotional meaning of the word expressed in the sentence, and the emotional meaning can be understood as whether the word is positive or negative, high or low, happy or sad, and the like.

Mutual information refers to a statistically independent relationship between a word or word and a category, and is often used to measure the mutual relationship between two objects.

Specifically, after a terminal collects a target text input by a user, a text feature obtaining model is used for obtaining text features of the target text.

Optionally, the text feature acquisition model may be a text feature information acquisition method based on a contextual framework, that is, feature elements (sentences, words, symbols, and the like) of text contents are determined first, and then language and semantic analysis is merged into a statistical algorithm to extract and process text contents contained in the target text, so as to obtain text features of the target text; the method can be a text feature acquisition method based On an ontology, namely, an ontology (On-topology) model is utilized to take the text content as input and output the text feature of the target text; the method can be a conceptual feature extraction method based on the Howept, that is, a feature acquisition method based on the conceptual features, on the basis of a Vector Space Model (VSM), the language semantic analysis is performed on the text content, the language semantic information of vocabularies is acquired by using a database of the Howept, the vocabularies with the same language semantic are mapped to the same subject concept, then the clustered words are obtained through clustering and serve as feature items of the text Vector of the VSM, then Model operation and the like are performed, and the like. It should be noted that there are many ways to obtain the text features of the target text, and the method may be a fitting of one or more of the above, and is not limited herein.

Specifically, after the terminal obtains the text features of the target text, the scene canvas corresponding to the text features is drawn. The scene canvas can be understood as a background diagram of a map, or an initial scene diagram of an element to be added, and the like.

In a feasible implementation manner, after the terminal obtains the text features of the target text, a key feature extraction method based on a text context can be used, namely, key elements (key sentences, key words, key symbols and the like) of the text features are extracted, semantic analysis is then merged into a statistical algorithm to extract the key elements contained in the text features, scene subjects (such as scene subjects of sand beach, mountain, river and the like) corresponding to the key elements are determined, and then corresponding scene canvases are matched in a preset mapping index library according to the scene subjects.

Step 102: and determining a target object to be drawn based on the text features and the scene canvas.

The target object can be understood as any drawing graph which can be drawn or inserted, and the graph can be changed and improved. The drawing figure includes a self-selected figure, a curve, a line, and the like, and in the embodiment of the present application, the target object may be a character (e.g., a cartoon character), an animal, a plant, a vehicle, a building, and the like to be drawn. For example, a teenager, a piece of sponge, a duck, a wolf, etc. may be the target object to be drawn.

Specifically, after acquiring text features of a target text and drawing a corresponding scene canvas according to the text features, the terminal inputs the text features and the scene canvas into an object determination model based on an attention mechanism, so as to output a target object to be drawn. And predicting a target object to be drawn at each time step of drawing the scene canvas by taking the text characteristics and the scene canvas as the input of a neural network model-object determination model, wherein for example, a first time step t1 predicts that a young person is to be drawn, and a second time step t2 predicts that a duck is to be drawn.

In embodiments of the present application, the attention mechanism may include at least two aspects: it is decided which part of the input needs to be taken care of and limited information processing resources are allocated to the important part. The attention mechanism is introduced to the scene canvas and the text features, so that more key image parts in the scene canvas, such as target objects to be drawn with higher priority in the current scene canvas, can be highlighted. For example, in a specific application scenario, the target text may include a description object, attribute information of the object, scene information, and the like. After the attention mechanism is introduced into the object determination model, the object part contained in the text feature can be highlighted, and the non-object part (attribute information such as color, emotion, action, position and the like) in the feature map is weakened, so that the highlighted part can be focused in the subsequent processing. To determine the target object to be rendered next.

In this embodiment, the object determination model is a neural network model, which is a system model that is configured by densely interconnecting simple nonlinear simulation processing elements of each of a plurality of nodes and that is a model that simulates biological neurons. The neural network model is formed by connecting the input of at least one node with the output of each node, similar to the synaptic connections of real neurons. Each neuron expresses a specific output function, the excitation function, and the connection between each two neurons contains a connection strength, i.e. a weight value acting on the signal passing through the connection. In this embodiment, a large number of scene canvases and their corresponding text features are input to a neural network model based on an attention mechanism for training, so as to obtain a trained object determination model, where the object determination model has the capabilities of extracting key information features, summarizing semantic knowledge and learning and memorizing in the target object determination process, and the information or knowledge learned by the neural network model is usually stored in a connection matrix between each unit node.

Optionally, the Neural Network model may be implemented based on fitting of one or more of a Convolutional Neural Network (CNN) model, a Deep Neural Network (DNN) model, a Recurrent Neural Network (RNN), a model, an embedding (embedding) model, a Gradient Boosting Decision Tree (GBDT) model, a logistic Regression (L logistic Regression, L R) model, and the like.

Specifically, when a terminal acquires a large amount of sample data containing text features and a scene canvas, the sample data is labeled, the labeling can be understood as labeling key information (an object to be drawn) corresponding to the sample data, the text features and the scene canvas are input to an initial object determination model for training, the object determination model is trained based on the labeled sample data, and a trained object determination model can be obtained.

Step 103: and determining the attribute characteristics of the target object according to the target object, the text characteristics and the scene canvas.

The attribute feature refers to a characteristic or attribute describing the target object, including but not limited to a facial feature, a clothing feature, an emotional feature, a behavior feature, and the like. For example, words such as "fierce evil braking", "beautiful", "gentle xiashu", "gray", "wind stream" (etc.) can be used to describe the characteristics of the target object. Also for example, text such as running, fighting, grimacing, resting, playing, accompanying, etc. may be used to characterize the target object.

Specifically, after determining a target object to be drawn, the terminal inputs the target object, the text feature and the scene canvas into an attribute determination model based on an attention mechanism, so as to output an attribute feature of the target object. And predicting the attribute characteristics of the target object corresponding to each time step of drawing the scene canvas by taking the target object, the text characteristics and the scene canvas as the input of a neural network model-attribute determination model, for example, predicting the expression, behavior and other characteristics of a juvenile to be drawn at a first time step t1, and predicting the face, behavior and other characteristics of a duck to be drawn at a second time step t 2.

In embodiments of the present application, the attention mechanism may include at least two aspects: it is decided which part of the input needs to be taken care of and limited information processing resources are allocated to the important part. Introducing an attention mechanism to the target object, the text feature, and the scene canvas may highlight more critical image portions of the scene canvas, such as attribute features of higher priority target objects in the current scene canvas. For example, in a specific application scenario, the target text may include a description object, attribute information of the object, scene information, and the like. After the attention mechanism object determination model is introduced, attribute parts (position, color, emotion, motion, position and other attribute information) contained in text features can be highlighted, and object parts (people, animals, plants and other objects) in the feature map are weakened, so that the highlighted parts can be focused in subsequent processing. So as to determine the attribute characteristics corresponding to the target object to be drawn next.

The attribute determination model based on the attention mechanism can be used for coding related content (namely attribute information) of a target object to be drawn based on context semantics of an input text feature attention target text, determining attribute information-position information and the like of the target object in a scene canvas, and finally outputting the attribute feature of the target object after being coded by the attribute determination model.

Optionally, the attribute determination model based on the attention mechanism may be a decoder model in a seq2seq framework. As shown in fig. 2, fig. 2 is a schematic structural diagram of an attribute determination model, in fig. 2, the target object x1, the text feature x2, and the scene canvas x3 constitute inputs x, h1, h2.. The object determination model has the capabilities of key information characteristic extraction, semantic knowledge summarization and learning and memory in the target object determination process, and the information or knowledge learned by the neural network model is usually stored on a connection matrix between each neural network computing unit. It should be noted that, in the model shown in fig. 2, it can be seen that the output attribute y at the previous time is used as the input at the current time, and the "input x" is only used as the initial state to participate in the operation, and the latter operation is not related to the "input x". For example, the output attribute y1 corresponding to the neural network computing unit h1 at the previous moment participates in the calculation as the input of the neural network computing unit h2 at the current moment. Thereby outputting output y (i.e., attribute features) containing attribute y1, attribute y2..

In a possible implementation, the structure of the attribute determination model may be a decoder model structure as shown in fig. 3. In the model shown in fig. 3, it can be seen that the output attribute y at the previous time is used as the input of the current time, and the "input x" is used as the steady state to participate in the operation of each neural network computing unit, i.e. the following operations are all related to the "input x". For example, the output attribute y1 corresponding to the neural network computing unit h1 at the previous moment participates in the calculation as the input of the neural network computing unit h2 at the current moment. Thereby outputting output y (i.e., attribute features) containing attribute y1, attribute y2.. Wherein "input x" participates as a steady state in the calculation via the network computing unit h2, and the latter calculation is related to "input x".

In the embodiment of the application, after the initial attribute determination model is created, a large amount of sample data including a scene canvas, a target object and the text features are acquired and input to a neural network model based on an attention mechanism for training, so that the trained attribute determination model can be obtained. Here, the training of the attribute determination model may be performed by a dynamic time warping-based training method (DTW), a vector quantization-based training method (VQ), a time series of image signals-based training method (HMM), or the like.

Step 104: and drawing the target object on the scene canvas, adjusting the target object based on the attribute characteristics, and generating a chartlet corresponding to the target text.

The charting can be understood as an image of a target text which is finally converted into corresponding text semantics by the terminal when the text charting method of the embodiment of the application is executed.

In one possible implementation, the terminal may invoke an object generation program to initially generate a feature map of the target object on the scene canvas. The feature map is an image that is preliminarily generated based on the object feature vector. In this embodiment, the feature map may generally be a low resolution image, such as an image with a resolution of 32 × 32 or 64 × 64. An object generates a corresponding object feature vector at a terminal, and the terminal can generate a feature map according to the object feature vector by acquiring the object feature vector corresponding to the target object. The image of the feature map of the target object on the scene canvas can be synchronously or asynchronously adjusted based on the attribute features, it should be noted that the object of the image adjustment further includes adjusting an image part associated with the target object, for example, adjusting an image background (plants, animals, etc.) associated with the target object, in a specific application scene, for example, a young bird data set (CUB) is adopted, a text to be processed is input as "a gray bird with a white front chest and a bad mood of the gray bird", the terminal generates the feature map of the bird on the scene canvas according to the feature vector of the bird, and can synchronously or asynchronously adjust the image of the feature map of the target object on the scene canvas according to the attribute features corresponding to the bird object: the appearance characteristics of the bird are adjusted to be grey fur, white in the chest area of the bird, the emotion characteristics of the bird are adjusted to be low in mood, the image part related to the bird, namely the scene environment, is characterized in rainy days (such as adding cloudy clouds and raindrops, and correspondingly adjusting scene brightness and contrast), and the like. And after the target object is adjusted according to the attribute characteristics, obtaining the adjusted scene canvas, namely the chartlet corresponding to the target text.

In a possible implementation manner, the terminal stores a map index library, the map index library stores at least a plurality of map elements (maps corresponding to objects), the terminal may obtain a map object corresponding to the target object in the map index library, add the map object to a scene canvas, and perform image adjustment on a feature map of the target object on the scene canvas based on the attribute features synchronously or asynchronously, where the image adjustment object further includes adjusting an image portion associated with the target object, for example, adjusting an image background (plants, animals, and the like) associated with the target object.

In the embodiment of the application, a terminal acquires text features of a target text, draws a scene canvas corresponding to the text features, determines a target object to be drawn based on the text features and the scene canvas, determines attribute features of the target object according to the target object, the text features and the scene canvas, draws the target object on the scene canvas, adjusts the target object based on the attribute features, and generates a chartlet corresponding to the target text. The scene canvas, the object to be drawn and the attribute characteristics of the object corresponding to the target text are determined step by step through the text characteristics of the target text, the image is correspondingly adjusted based on the attribute characteristics (position, action, posture and the like) of the object when the object is drawn by the scene canvas, the problem that the generated chartlet is difficult to conform to the actual description scene of the text according to the key words can be avoided, a chartlet with accurate scene, clear object and clear attribute can be generated, the chartlet can be closer to the semantics expressed by the actual text, the conformity between the chartlet and the actual description scene of the text can be improved, and the accuracy of generating the chartlet is improved.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating a text-to-paste method according to another embodiment of the present disclosure. Specifically, the method comprises the following steps:

step 201: and inputting the target text into a text encoder, and outputting text characteristics corresponding to the target text.

The text encoder is used for outputting encoded feature expression to a target text input into the text encoder, namely outputting corresponding text features of the target text, wherein the text features are generally characterized in a form of encoding vectors in practical application.

Specifically, the text encoder may compress the target text by using a deep neural network to obtain a coding vector corresponding to each time; the specific mode is that a long-time memory network model in a deep neural network is used, each text element (word, sentence, symbol and the like) of a target text is sequentially input into the network, and a hidden layer representation hi corresponding to each moment (namely time step) is obtained.

Specifically, a long-short-term memory network (L STM) in a deep neural network is used, a text element (such as a word, a character and the like) in a target text is input to the long-short-term memory network (L STM) at the beginning time, the word is compressed into a vector, the compressed vector is transmitted to the next time, the compressed vector at the previous time and the next text element of the original text are input to the next time through a cyclic neural network at the next time, the compressed vector and the next text element of the original text are compressed into new vectors and transmitted to the next time, the encoding vector at each time obtained after all texts are compressed is the feature information needed when decoding (including attribute decoding, object decoding, scene decoding and the like) is carried out on the encoding vector, the time of the embodiment is the number of words of sentences, the hidden layer vector corresponding to each time is the vector of the long-short-term memory network (L STM) which compresses the words into the vectors, and the text features corresponding to the target text can be obtained through the encoding.

In one specific implementation scenario, the text encoder may be constructed based on an L STM evolutionary network GRU network, i.e., using a single-layer bidirectional round-robin network (BiGRU) with gated round-robin units (GRU). it takes the linear embedding of text elements-per-word as input, and the hidden dimension for each direction may be a fixed dimension, e.g., 256 dimensions as the dimension may be all.

For example, the encoding process of a text encoder can be characterized by the following equation, for a given target text, we compute for each text element-word:

where BiGRU denotes a neural network with bidirectional GRU units, Xi is a word-embedding vector corresponding to the ith word, and

is a context concealment vector (expressed in the form of a feature vector) corresponding to an encoded text element, here by a context concealment vector

Vector pair formed by word embedding vector Xi

I.e. the output-text characteristics of the text encoder.

Step 202: and extracting a scene theme corresponding to the text feature, indexing a scene chartlet corresponding to the scene theme in a preset chartlet index library, and determining the scene chartlet as a scene canvas.

The map index library is a pre-established image library containing a large number of map materials, the image library contains scene maps corresponding to a plurality of scene topics, and in the embodiment of the application, the terminal can search the scene maps corresponding to the scene topics in the map index library.

Specifically, after the terminal obtains the text features of the target text, the scene theme corresponding to the text features is extracted, the scene chartlet corresponding to the scene theme is indexed in a preset chartlet index library, and the scene chartlet is determined as the scene canvas. The scene canvas can be understood as a background diagram of a map, or an initial scene diagram of an element to be added, and the like.

In a feasible implementation manner, after the terminal obtains the text features of the target text, a key feature extraction method based on a text context may be used, that is, key elements (key sentences, key words, key symbols, and the like) of the text features are extracted, semantic analysis is then merged into a statistical algorithm to extract the key elements included in the text features, scene topics (such as scene topics of sand beach, mountain, river, and the like) corresponding to the key elements are determined, then corresponding scene maps are matched in a preset map index library according to the scene topics, and after the scene maps are found, the scene maps may be determined as a scene canvas.

Step 203: and inputting the scene canvas into a convolution network for scene coding, and outputting a scene characteristic diagram after the scene coding.

The scene characteristic diagram can be understood as a scene characteristic diagram obtained by identifying an image to be identified through a convolutional network. In the embodiment of the application, the scene feature map is obtained by identifying a scene canvas through a convolutional network, and generally the scene feature map includes at least one scene feature value.

For example, the scene feature map may be obtained by extracting a scene feature value of the image to be recognized through a convolutional network (CNN network), and the specific process may be as follows:

the convolutional network comprises one or more convolutional kernels used for extracting characteristic information from a pixel matrix of a scene canvas, the pixel matrix of an image to be identified is traversed by the convolutional kernels according to a certain step length, the scene canvas is input into the convolutional network for scene coding, at least one scene characteristic value can be obtained, and a scene characteristic diagram is formed by the at least one scene characteristic value.

In one particular embodiment, the convolutional network may be a convolutional GRU network composed of at least one gated cyclic unit (ConvGRU unit). Each convolutional layer in the convolutional GRU network has a convolutional kernel of 3 × 3 with a stride (i.e., step size) of 1 and hidden dimensions of 512.

Illustratively, the convolutional network may be characterized by the following formula, with the scene canvas Bt as an input to the convolutional network:

wherein the content of the first and second substances,

for the scene state at the current time step, also understood as the scene feature value, ConvGRU (Ω) is a convolutional network with gated cyclic units,

scene states at historical time steps, each here

The informativeness of the temporal dynamics characterizing each spatial (grid) location in the scene can then be aggregated

And C H W scene feature graph representing the output of the convolution network and the current scene state, wherein C is the number of channels (formed by the number of input channels and the number of output channels), and H and W are convolution height and convolution width.

Step 204: and inputting the text feature and the scene feature map into an object decoder, and outputting the target object.

The object decoder is an attention-based neural network model that outputs target objects based on likelihood scores for all possible objects. The method takes the circulating scene states and text characteristics in a scene characteristic diagram as input, and outputs a target object to be drawn in the current time step. In practical applications, the object decoder determines a model for an object, and the object decoder includes, but is not limited to, a scene pooling part, a text attention part, and an object convolution part.

In the present embodiment, the object decoder is configured by densely interconnecting simple nonlinear analog processing elements of each of a plurality of nodes, and is a system model that simulates biological neurons. The object decoder is formed by connecting the input of at least one node with the output of each node, similar to the synaptic connections of real neurons. Each neuron expresses a specific output function, the excitation function, and the connection between each two neurons contains a connection strength, i.e. a weight value acting on the signal passing through the connection. In this embodiment, a large amount of scene canvases and their corresponding text features are input to a neural network model based on attention mechanism for training, so as to obtain a trained object decoder, which has the capabilities of extracting key information features, generalizing semantic knowledge and learning and memorizing in the target object determination process, and the information or knowledge learned by the object decoder is usually stored in a connection matrix between each unit node. After the text features and the scene feature map are input into the object decoder, the scene pooling part of the object decoder performs pooling processing on the scene feature map to collect scene space context required by object prediction, for example, whether a target object and an added history object are added to a canvas corresponding to the scene feature map at the current time step, and then the pooling processing is used for controlling a pooling neural unit to fuse the involved object space features into a scene attention vector, the text attention part of the object decoder performs processing on the scene attention vector and the text features, specifically, focuses on the semantic context of the target text and highlights the object part contained in the corresponding text feature by attention, and weakens the non-object parts (attribute information of color, emotion, motion, position, and the like) in the feature map, and finally, carrying out convolution calculation on the text attention vector and the scene attention vector by an object convolution part of an object decoder, so that the target object is output by an output layer of the convolution network.

In a possible embodiment, in order to make the object to be rendered predicted by the object decoder more accurate, the rendered historical object may be taken into reference to improve the accuracy of the prediction result of the object decoder. The history object may be understood as an object drawn at a time before the current time (i.e. the current time step), and in practical applications, the history object is usually an object drawn at a previous time (i.e. the previous time step), and assuming that the object corresponding to the current time step T is the target object 1, the history object is an object drawn at a time T-1 (i.e. the previous time step). The method specifically comprises the following steps:

1. the terminal can obtain the drawn historical object, specifically obtain the object O predicted by the last time step object encoder_t-1Usually the object O_t-1And carrying out coding representation by using the object feature vector with high dimension.

2. Performing pooling processing on the input scene feature map to obtain a pooled first scene attention vector u_t ⁰I.e. by

Where avgpouling () represents the scene pooling part of the object encoder, typically the pooling layer in the object encoder, Ψ⁰Is a convolutional network of the object encoder,

for the scene state of the current time step, Ψ⁰Can pay attention to

The spatial state of (a).

The terminal inputs the text feature, the scene feature map and the historical object into an object encoder, and a scene pooling part of an object decoder performs pooling on the scene feature map to collect scene space context required by object prediction, namely

For example, whether a target object and an added history object are added to the canvas corresponding to the scene feature map in the current time step, and then the pooling processing is performed to control the pooling neural unit of the pooling layer to fuse the participating object spatial features into the scene attention vector, so as to obtain the first scene attention vector u after the pooling processing_t ⁰。

3. Inputting the first scene attention vector, the historical objects and the text features into a text attention part of an object encoder, a first text attention device, outputting a first text attention vector

Namely, it is

Wherein phi is¹For the text attention part of the object encoder-first text attention device, u_t ⁰Is the first scene attention vector, O_t-1In the form of a history object, the history object,

is a textual feature.

Text attention part of object decoder-first text attention device processes scene attention vector, history object and text feature, specifically input through attention mechanism

To focus on the semantic context of the target text, i.e.

And highlighting the object part contained in the corresponding text feature, and weakening the non-object part (attribute information such as color, emotion, action, position and the like) in the feature map to determine the text information contained in the text feature of the object to be drawn, wherein the text information is represented by the first text attention vector.

4. And inputting the first scene attention vector, the historical object and the first text attention vector into an object convolution network, and outputting a target object to be drawn. The object convolutional network is usually a convolutional sensor with a predetermined number of layers, for example, a convolutional sensor with a predetermined number of layers of 2; the method includes the steps of taking a first scene attention vector, the history object and the first text attention vector as input, integrating features of the input into an output layer at a full connection layer of a convolution sensor according to the first scene attention vector, the history object and the first text attention vector, predicting the likelihood of a next object by using an excitation function (softmax function), and outputting a target object with high likelihood, namely the target object.

Step 205: and inputting the text features and the target object into a second text attention device, and outputting a second text attention vector.

The second text attention device, the scene convolutional network and the attribute convolutional network together form an attribute decoder, the attribute decoder is an attention-based neural network model, and the attribute decoder introducing an attention mechanism into the target object, the text feature and the scene canvas can highlight more critical image parts in the scene canvas, such as the attribute features of the target object with higher priority in the current scene canvas. In practical application, an attribute decoder consisting of a second text injector, a scene convolution network and an attribute convolution network is used as an attribute determination model. It should be noted that the attribute decoder includes, but is not limited to, a second text carer, a scene convolutional network, and an attribute convolutional network according to the specific implementation environment.

Specifically, the terminal inputs the text feature and the target object to a text attention part of an attribute decoder, namely a second text attention device, and outputs a first text attention vector

Namely, it is

Wherein phi is²For the text attribute part of the attribute decoder-a second text annotator, O_tIs a target object of the target object,

is a textual feature.

Text attention part of attribute decoder-second text attention device processes target object and text feature, specifically O input by attention mechanism_tTo focus on the semantic context of the target text, i.e.

And highlighting attribute parts (color, emotion, action, position and other attribute information) contained in the corresponding text features, weakening the object parts, pre-training a calculation matrix contained in a text attention device through a large amount of sample data, learning and calculating an attention score to determine text information contained in the text features of the object to be drawn, wherein the text information is represented by a second text attention vector.

Step 206: and inputting the scene feature map and the second text attention vector into a scene convolution network, and outputting a second scene attention vector.

Specifically, the scene convolution network is an attention module based on an image (scene feature map), and relevant scene information of an object to be added in a scene canvas, such as scene information of an adding position of the object to be added in the scene canvas, can be collected through the scene convolution network. Typically the scene convolutional network may be a scene space attention module consisting of two convolutional layers. Namely, it is

Wherein the content of the first and second substances,

for the second scene attention vector, Ψ^aIn order to be a scene convolution network,

in order to be a scene feature map,

is the second text attention vector.

Specifically, the scene convolution network includes, for example, an input layer, a convolution layer, a pooling layer, a full-link layer, and an output layer. In some embodiments, the scene feature map and the second text attention vector are used as input, and are input to the scene convolution network, for example, the scene feature map and the second text attention vector are received by an input layer of the scene convolution network, and the input layer can normalize the input data (the scene feature map and the second text attention vector), which is beneficial to improving the learning efficiency and performance of the scene convolution network. And then the scene feature graph is subjected to feature extraction and calculation through the convolution layer, is transmitted to the pooling layer to be subjected to text and scene feature selection and information filtering, and is subjected to feature synthesis through the full-connection layer to the output layer so as to output a second scene attention vector. By inputting the scene feature map and the second text attention vector into the scene convolutional neural network, more feature information can be accumulated based on the text attention vector on the basis of the scene feature map, so that the scene content representation of the feature map is obtained, and the second scene attention vector corresponding to the scene feature map and the second text attention vector is obtained.

Step 207: and inputting the second scene attention vector, the target object and the second text attention vector into an attribute convolution network, and outputting the attribute characteristics of the target object.

The attribute convolutional network is typically a predetermined number of convolutional sensors, such as a predetermined number of 4 convolutional networks (CNN networks); and with the second scene attention vector, the target object and the second text attention vector as inputs, integrating features of the inputs into an output layer at a full connection layer of the convolution sensor according to the second scene attention vector, the target object and the second text attention vector, and predicting at least one attribute of a next target object by using an excitation function (softmax function) so as to output attribute features of the target object. The attribute feature P () represents as follows:

where Θ is an attribute convolutional network, the output layer of Θ has "1 + ∑_kR^K"a plurality of output channels, wherein Rt^kRepresenting the discrete range of the kth attribute at the current time step, lt is attribute feature-location attribute, and in practical applications, the first channel of the output layer Θ predicts the likelihood of the object minimizing location in the spatial domain using the softmax function. The remaining channels predict the properties of each grid location of interest (determined by the scene attention vector). During the training process, the likelihood from the true position is used to calculate the loss. In each time step of determining the attribute, a previous position is sampled from the attribute convolutional network, and then attribute information corresponding to the target object is collected from the sampling position until the attribute information corresponding to the target object is collected for all the sampling positions (i.e., each grid position of interest) predicted by the second scene attention vector, so as to output the attribute feature of the target object at the output layer by using an excitation function (softmax function).

Step 208: and indexing the map object corresponding to the target object in a preset map index library.

Specifically, the terminal stores a map index library, where the map index library at least stores a plurality of map elements (maps corresponding to objects), and the terminal may obtain a target map element in the map index library by using an image retrieval technique, and use the target map element as a map object corresponding to the target object.

The map elements of the map index library may be all or part of sample images obtained from an existing image database (such as WIDER FACE data set, IJB-C test set, AVA data set, coco data set, etc.) as map elements, and/or sample images taken in an actual environment by using a device with a photographing function. The method comprises the steps of obtaining a large number of sample images, and preprocessing the sample images, wherein the preprocessing comprises the processing processes of digitalization, geometric transformation, normalization, smoothing, restoration enhancement and the like, so as to obtain processed map elements.

Optionally, the image retrieval technique may be an image retrieval algorithm, and the image retrieval algorithm includes, but is not limited to, a local Sensitive Hashing (L) algorithm, an SH spectrum Hashing algorithm, an SDH supervised discrete retrieval algorithm, a vector of locally aggregated descriptors (V L AD) retrieval algorithm, a K-D tree retrieval algorithm, and the like.

In a possible embodiment, among the map elements contained in the image database, when the obtained sample image is stored as a map element, an image identifier, which may be an image id, a number, a specific character string, or the like, may be assigned to the map element, and the map element is identified by the image identifier. Then, during retrieval, the image identifier of the target object can be directly obtained based on the target object, and the map element corresponding to the image identifier is queried in an image database.

Step 209: and inputting the attribute characteristics, the map object and the scene canvas into a canvas adjustment model, and outputting the map corresponding to the target text.

The canvas adjustment model may be one of neural network models, and is configured to draw and adjust a map object on a scene canvas according to input (attribute features, a map object, and the scene canvas) of the model, and output a map corresponding to a target text with the adjusted scene canvas as an output.

In the embodiment of the application, the canvas adjustment model draws the currently determined chartlet object on the scene canvas by taking the attribute characteristics as the reference, the image adjustment task is simplified without directly generating a high-resolution image, the canvas adjustment model simulates image distribution step by step from low to high in layering according to the attribute characteristics, and the chartlet object is subjected to image adjustment on the scene canvas. It should be noted that the object of image adjustment further includes adjusting an image portion associated with the target object, for example, adjusting an image background (plant, animal, etc.) associated with the target object. In a specific application scene, for example, the input text to be processed is "a gray bird with white breast, the gray bird is not in good mood", the terminal draws a chartlet object of the bird on a scene canvas according to a bird feature vector, and can synchronously or asynchronously adjust the image of the feature chartlet of a target object on the scene canvas according to the attribute feature corresponding to the bird object: the appearance characteristics of the bird are adjusted to be grey fur, white in the chest area of the bird, the emotion characteristics of the bird are adjusted to be low in mood, the image part related to the bird, namely the scene environment, is characterized in rainy days (such as adding cloudy clouds and raindrops, and correspondingly adjusting scene brightness and contrast), and the like. And after the target object is adjusted according to the attribute characteristics, obtaining the adjusted scene canvas, namely the chartlet corresponding to the target text.

In practical application, the target text generally corresponds to a plurality of objects to be drawn in the process of text chartlet transfer, and the adding and drawing processes of each object may respectively correspond to one time step.

Illustratively, the target text may be, for example:

Tim is holding a hotdog.Amy is sitting in the sandbox.Amy is holdingthe shovel.

when the terminal processes the target text, the terminal can generate a map step by adopting the text-to-map method, which specifically comprises the following steps:

at time step T1: and performing text coding on the target text to obtain text characteristics corresponding to the target text. Extracting a scene topic 1 corresponding to the text feature, indexing a scene map corresponding to the scene topic 1 in a preset map index library, wherein the scene map is as shown in fig. 5, fig. 5 is a schematic diagram corresponding to the scene map, and the scene map as shown in fig. 5 is determined as a scene canvas.

At time step T2, the target object of interest to be rendered may be a "sandbox", then the attributes (location, size, color, spatial relationship, etc.) of the "sandbox" are determined, the target object- "sandbox" is rendered on the scene canvas, and the target object is adjusted based on the attribute features. The adjusted scene map is shown in fig. 6, and it can be noted that a "sandbox" is drawn in fig. 6.

At time step T3, the target object to be drawn in question may be "Tim", and the historical object is "sandbox", then the attributes (position, size, color, spatial relationship, etc.) of "Tim" are determined, such as "holding", and the target object- "Tim" is drawn on the scene canvas and adjusted based on the attribute characteristics. The adjusted scene map is shown in fig. 7, and it can be noted that a character "Tim" is drawn in fig. 7, where "Tim" stands next to "sandbox" and acts as "left-handed".

At time step T4, the target object to be drawn in focus may be "Amy" and the historical object is "Tim", then the attributes (position, size, color, spatial relationship, etc.) of "Jenny" are determined, such as "holding" and "positioning", the target object- "Amy" is drawn on the scene canvas, and the target object is adjusted based on the attribute characteristics. The adjusted scene map is shown in fig. 8, and it can be noted that a character "Amy" is drawn in fig. 8, where "Amy" stands on "sandbox" and acts as "left-handed".

At time step T5, the target object to be drawn in question may be "hotsog" history object "Tim", then determine the attributes (position, size, color, spatial relationship, etc.) of "hotsog", such as "holding" and "Timholding", draw the target object- "hotsog" on the scene canvas, and adjust the target object based on the attribute characteristics. The adjusted scene map is shown in fig. 9, and it can be noted that fig. 9 plots an article "hotlog" held by the hand "Tim.

At time step T6, the target object to be drawn in question may be a "shovel" historical object, which is "Amy", and then attributes (position, size, color, spatial relationship, etc.) of the "shovel" are determined, such as "holding" and "shovel holding", and the target object, "shovel", is drawn on the scene canvas and adjusted based on the attribute characteristics. The adjusted scene map is shown in fig. 10, and it is noted that fig. 10 depicts an article "shovel" held by hand "Amy".

At this time, after all the objects are added, the canvas adjustment model completes the image adjustment on the scene canvas 1, and at this time, a chartlet corresponding to the target text is output, and the chartlet is as shown in fig. 10. The above-described object text is only for facilitating understanding to better explain the embodiments of the present application, and the added details involved in explanation are not particularly limited.

In one possible implementation, the canvas adjustment model may be an end-to-end generative confrontation model to simulate a series of multi-scale image distributions. The canvas adjusts the number of generators and discriminators that the model may be composed of. And drawing, generating and correcting map elements with different resolutions on each branch of the model. On each branch, the generator captures the image distribution of the scene canvas on the corresponding resolution, and the generated image and the real image with the corresponding size are distinguished by the discriminator to jointly train the generator, so that the multi-layer distribution is approximated, the fact that the generated chartlet is approximated to the real semantic meaning of the target text in the semantic meaning is favorably ensured, and the accuracy of the generated chartlet is higher.

In a possible implementation, the canvas adjustment model may be an image synthesis model based on mapping retrieval, and specifically, the canvas adjustment model may retrieve at least one mapping element matching in a preset mapping index library based on an appearance vector in the attribute feature, obtain patches (i.e., all or part of the mapping elements) from the mapping elements, and perform canvas adjustment processing on the target object and the part associated with the target object of the scene canvas by each patch. Specifically, an appearance vector is predicted aiming at each position in a feature map corresponding to a scene canvas, similar patches are searched in a map index library according to the appearance vector, meanwhile, when the canvas adjustment model is created, a patch embedder is trained based on the CNN network, the patch embedder of the canvas adjustment model can embed the retrieved foreground patch (namely the retrieved patch) into the scene canvas in the process of drawing the object map by the scene canvas, in particular by reducing the foreground patch to a vector of specified dimensions via a neural computation unit in the network, for example, as the one-dimensional vector Ft, a triplet embedding method is used, that is, a triplet loss is calculated in the model processing process to reduce the euclidean distance between lt and Ft, and after the target object is adjusted according to the attribute characteristics, an adjusted scene canvas, that is, a map corresponding to the target text, can be obtained.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Please refer to fig. 11, which illustrates a schematic structural diagram of a text chartlet apparatus according to an exemplary embodiment of the present application. The text charting apparatus may be implemented as all or part of an apparatus, in software, hardware, or a combination of both. The apparatus 1 comprises a scene canvas rendering module 11, a target object determination module 12, an attribute feature determination module 13 and a chartlet generation module 14.

The scene canvas drawing module 11 is configured to obtain a text feature of a target text, and draw a scene canvas corresponding to the text feature;

a target object determination module 12, configured to determine a target object to be drawn based on the text feature and the scene canvas;

an attribute feature determination module 13, configured to determine an attribute feature of the target object according to the target object, the text feature, and the scene canvas;

and the map generating module 14 is configured to draw the target object on the scene canvas, adjust the target object based on the attribute characteristics, and generate a map corresponding to the target text.

Optionally, as shown in fig. 12, the scene canvas rendering module 11 includes:

the text feature output unit 111 is configured to input a target text into a text encoder, and output a text feature corresponding to the target text;

and a scene canvas determining unit 112, configured to extract a scene topic corresponding to the text feature, index a scene map corresponding to the scene topic in a preset map index library, and determine the scene map as a scene canvas.

Optionally, as shown in fig. 16, the apparatus 1 further includes:

and the scene characteristic diagram coding module 15 is configured to input the scene canvas into a convolutional network for scene coding, and output a scene characteristic diagram after the scene coding.

Optionally, the target object determining module 12 is specifically configured to:

and inputting the text feature and the scene feature map into an object decoder, and outputting the target object.

Optionally, as shown in fig. 16, the apparatus 1 further includes:

a history object obtaining module 16, configured to obtain a drawn history object;

as shown in fig. 13, the target object determining module 12 includes:

the vector pooling unit 121 is configured to perform pooling processing on the scene feature map to obtain a pooled first scene attention vector;

a text vector output unit 122, configured to input the first scene attention vector, the history object, and the text feature to a first text attention device, and output a first text attention vector;

a target object output unit 123, configured to input the first scene attention vector, the history object, and the first text attention vector into an object convolution network, and output the target object.

Optionally, as shown in fig. 14, the attribute feature determining module 13 includes:

a text vector output unit 131, configured to input the text feature and the target object to a second text attention device, and output a second text attention vector;

a scene vector output unit 132, configured to input the scene feature map and the second text attention vector into a scene convolution network, and output a second scene attention vector;

an attribute feature output unit 133, configured to input the second scene attention vector, the target object, and the second text attention vector into an attribute convolutional network, and output an attribute feature of the target object.

Optionally, as shown in fig. 15, the map generating module 14 includes:

a map object drawing unit 141, configured to index a map object corresponding to the target object in a preset map index library;

and a scene canvas adjusting unit 142, configured to input the attribute features, the map object, and the scene canvas into a canvas adjustment model, and output the map corresponding to the target text.

It should be noted that, when the text charting apparatus provided in the foregoing embodiment executes the text charting method, only the division of the above functional modules is taken as an example, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. In addition, the text-to-map apparatus provided in the above embodiments and the text-to-map method embodiments belong to the same concept, and details of implementation processes thereof are referred to in the method embodiments and are not described herein again.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the text-to-map method according to the embodiment shown in fig. 1 to 10, and a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 to 10, which is not described herein again.

The present application further provides a computer program product, where at least one instruction is stored, and the at least one instruction is loaded by the processor and executes the text-to-map method according to the embodiment shown in fig. 1 to 10, where a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 to 10, and is not described herein again.

Fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 17, the electronic device 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.

Wherein a communication bus 1002 is used to enable connective communication between these components.

The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.

The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

The processor 1001 may be implemented in the form of at least one of Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), Programmable logic Array (Programmable L organic Array, P L a), the processor 1001 may be implemented in the form of at least one of Digital Signal Processing (DSP), Digital Signal Processing (FPGA), Programmable logic Array (GPU), or a combination of one or more of a modem, wherein the CPU primarily handles operating systems, user interfaces, application programs, etc., the modem may be used to display content, render content, and render content for display, and the like, and the wireless Processing chip 1001 may be implemented in a separate communication device.

The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 17, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a text decal application program.

In the electronic device 1000 shown in fig. 17, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to call the text chartlet application stored in the memory 1005, and specifically perform the following operations:

In an embodiment, when the processor 1001 executes the text feature of the obtained target text and draws a scene canvas corresponding to the text feature, the following operations are specifically executed:

inputting a target text into a text encoder, and outputting text characteristics corresponding to the target text;

and extracting a scene theme corresponding to the text feature, indexing a scene chartlet corresponding to the scene theme in a preset chartlet index library, and determining the scene chartlet as a scene canvas.

In one embodiment, after executing the obtaining of the text feature of the target text and drawing the scene canvas corresponding to the text feature, the processor 1001 further executes the following operations:

and inputting the scene canvas into a convolution network for scene coding, and outputting a scene characteristic diagram after the scene coding.

In an embodiment, when the processor 1001 determines the target object to be drawn based on the text feature and the scene canvas, the following operations are specifically performed:

In one embodiment, the processor 1001, when executing the method of text mapping, further performs the following operations:

acquiring a drawn history object;

the inputting the text feature and the scene feature map into an object decoder and outputting the target object includes:

pooling the scene feature map to obtain a first scene attention vector after pooling;

inputting the first scene attention vector, the historical objects and the text features into a first text attention device, and outputting a first text attention vector;

inputting the first scene attention vector, the historical object and the first text attention vector into an object convolution network, and outputting the target object.

In an embodiment, when the processor 1001 determines the attribute feature of the target object according to the target object, the text feature and the scene canvas, specifically performs the following operations:

inputting the text features and the target object into a second text attention device, and outputting a second text attention vector;

inputting the scene feature map and the second text attention vector into a scene convolution network, and outputting a second scene attention vector;

and inputting the second scene attention vector, the target object and the second text attention vector into an attribute convolution network, and outputting the attribute characteristics of the target object.

In an embodiment, when the processor 1001 executes the drawing of the target object on the scene canvas, adjusts the target object based on the attribute characteristics, and generates the chartlet corresponding to the target text, the following operations are specifically executed:

indexing a map object corresponding to the target object in a preset map index library;

and inputting the attribute characteristics, the map object and the scene canvas into a canvas adjustment model, and outputting the map corresponding to the target text.

"Unit" and "module" in this specification refer to software and/or hardware that can perform a particular function, either independently or in conjunction with other components, such as a Field Programmable Gate Array (FPGA), Integrated Circuit (IC), etc.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A text transfer method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining of the text feature of the target text and the drawing of the scene canvas corresponding to the text feature comprise:

3. The method according to claim 1, wherein after obtaining the text feature of the target text and drawing a scene canvas corresponding to the text feature, the method further comprises:

4. The method of claim 3, wherein determining the target object to be rendered based on the text feature and the scene canvas comprises:

5. The method of claim 4, further comprising:

acquiring a drawn history object;

6. The method of claim 3, wherein determining the attribute characteristics of the target object based on the target object, the text characteristics, and the scene canvas comprises:

7. The method of claim 1, wherein the drawing the target object on the scene canvas, and adjusting the target object based on the attribute characteristics to generate a chartlet corresponding to the target text comprises:

8. A text-to-paste apparatus, the apparatus comprising:

9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to carry out the method steps according to any one of claims 1 to 7.

10. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 7.