CN110502236B

CN110502236B - Front-end code generation method, system and equipment based on multi-scale feature decoding

Info

Publication number: CN110502236B
Application number: CN201910727451.8A
Authority: CN
Inventors: 吕晨; 闵维潇; 张菡文; 高学剑; 吕蕾; 刘弘
Original assignee: Shandong Normal University
Current assignee: Guangzhou Yida Information Technology Co.,Ltd.
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2022-10-25
Anticipated expiration: 2039-08-07
Also published as: CN110502236A

Abstract

The disclosure discloses a front-end code generation method, a system and equipment based on multi-scale feature decoding, which are used for acquiring a front-end image of a front-end code to be generated; extracting image characteristic vectors from a front-end image of a front-end code to be generated by utilizing a pre-trained convolutional neural network; and inputting the extracted image feature vector into a pre-trained multi-scale long-short term memory network (LSTM), and outputting a target front-end code. The method trains an end-to-end model of the deep neural network by applying a deep learning technology, and can directly and automatically convert a given UI (user interface) into a target code, so that a front-end development engineer puts main energy on the realization of a logic interaction function, the workload of the front-end development engineer is reduced, and the development efficiency of software is improved.

Description

Front-end code generation method, system and equipment based on multi-scale feature decoding

Technical Field

The present disclosure relates to the field of software development and automation maintenance technologies, and in particular, to a front-end code generation method, system, and device based on multi-scale feature decoding.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:

with the development of the internet, especially the large-scale popularization of the mobile internet, the front-end interface plays an important role in the interaction between users and the internet. The internet user can perform basic operations such as browsing, querying and the like through a user interface, and the current front-end interface mainly comprises two types, namely a Web-end interface existing in a Web form and a mobile-end interface represented by iOS and Android.

In a conventional front-end development process, a front-end development engineer needs to convert a UI interface given by a designer into corresponding front-end code. Because the user interface is very complicated, a front-end development engineer not only needs to write a UI interface, but also needs to write corresponding interactive logic, so that the corresponding front-end code amount is huge. And a large number of repeated label elements exist in the work of writing the UI, so that the front-end development work is complicated and the work repeatability is large.

Disclosure of Invention

In order to solve the deficiencies of the prior art, the present disclosure provides a front-end code generation method, system and device based on multi-scale feature decoding; the method trains an end-to-end model of the deep neural network by applying the deep learning technology, and can directly and automatically convert a given UI (user interface) into a target code, so that a front-end development engineer puts main energy on the realization of a logic interaction function, the workload of the front-end development engineer is reduced, and the development efficiency of software is improved.

In a first aspect, the present disclosure provides a front-end code generation method based on multi-scale feature decoding;

the front-end code generation method based on multi-scale feature decoding comprises the following steps:

acquiring a front-end image of a front-end code to be generated;

extracting image characteristic vectors from a front-end image of a front-end code to be generated by utilizing a pre-trained convolutional neural network;

and inputting the extracted image feature vector into a pre-trained multi-scale long-short term memory network LSTM, and outputting a target front-end code.

In a second aspect, the present disclosure also provides a front-end code generation system based on multi-scale feature decoding;

a front-end code generation system based on multi-scale feature decoding, comprising:

the acquisition module is used for acquiring a front-end image of a front-end code to be generated;

the extraction module is used for extracting image feature vectors from front-end images of front-end codes to be generated by utilizing a pre-trained convolutional neural network;

and the front-end code generating module is used for inputting the extracted image characteristic vectors into a pre-trained multi-scale long-short term memory network (LSTM) and outputting target front-end codes.

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method of the first aspect.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.

Compared with the prior art, the beneficial effect of this disclosure is: compared with the traditional front-end development work, the method and the system automatically generate the front-end codes based on the deep learning technology, and the machine automatically finishes the work from images to codes, so that the development efficiency is greatly improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a schematic flow chart of the method of the first embodiment;

fig. 2 is a structure of a code with different scales of the first embodiment, in this embodiment, the code is divided into three different scales of blocks, labels and attributes.

Fig. 3 is a front-end code generation method based on multi-scale feature decoding of the first embodiment.

FIG. 4 is a process of extracting image features by the convolutional neural network of the first embodiment, which includes three parts, namely a convolutional layer, a pooling layer and a full-link layer.

Fig. 5 is a structure diagram of the long and short term memory network of the first embodiment, which includes three input parts of an input gate, a forgetting gate and an output gate.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiment I provides a front-end code generation method based on multi-scale feature decoding;

s1: acquiring a front-end image of a front-end code to be generated;

s2: extracting image characteristic vectors from a front-end image of a front-end code to be generated by utilizing a pre-trained convolutional neural network;

s3: and inputting the extracted image feature vector into a pre-trained multi-scale long-short term memory network (LSTM), and outputting a target front-end code.

As one or more embodiments, the front-end image of the front-end code to be generated is a UI interface.

As one or more embodiments, the training process of the convolutional neural network and the multi-scale long short term memory network LSTM is performed simultaneously, and the training process of the convolutional neural network and the multi-scale long short term memory network LSTM includes:

constructing a convolutional neural network;

constructing three long-term and short-term memory networks (LSTM) with different scales;

the output end of the convolution neural network is respectively connected with the input ends of three long-term and short-term memory networks LSTM with different scales; the three different scales are respectively a block level, a label level and an attribute level;

constructing a training set, wherein the training set comprises: known front-end codes and corresponding known front-end images in a GitHub code library;

inputting a known front-end image into a convolutional neural network, and extracting image characteristics of the known front-end image;

inputting the image characteristics of the known front-end image into a long-short term memory network (LSTM) with a block-level scale; outputting a block-level front-end code by a long-short term memory network (LSTM) with a block-level scale;

multiplying the image characteristics of the known front-end image by a first set weight, and inputting the image characteristics into a long-short term memory network (LSTM) with a label-level scale; outputting a label-level front-end code by a long-short term memory network (LSTM) with label-level scale;

multiplying the image characteristics of the known front-end image by a second set weight, and inputting the image characteristics into a long-short term memory network (LSTM) with an attribute level scale; outputting attribute-level front-end codes by using an attribute-level long-short term memory network (LSTM);

the output ends of the three long-term and short-term memory networks LSTM with different scales are connected with a fusion module, and the fusion module embeds the attribute-level front-end code into the label-level front-end code to obtain a modified label-level front-end code; the fusion module is also used for embedding the modified label-level front-end code into the block-level front-end code to obtain the modified block-level front-end code; the modified block-level front-end code is the finally obtained fused front-end code;

and calculating a loss function value according to the fused front-end code and a known front-end code corresponding to a known front-end image in a Github code library, if the function value is smaller than a set threshold value, finishing the training, and otherwise, continuing the training.

As one or more embodiments, the construction of the long-short term memory network LSTM with three different scales refers to the output function of the long-short term memory network LSTM, which is represented by the front-end code generating different hierarchies.

The block-level scale refers to a structural element label-level code used for determining the block layout of the front-end interface in the front-end interface code, and the front-end interface is divided into different areas through the block-level scale code.

The label-level scale is a content element label-level code contained in the block-level code, and a label contained in the block-level code is determined through the label-level scale.

The attribute-level scale refers to the attribute-level code contained in the label-level scale code, and the attribute of the label generated by the label-level scale can be determined by the attribute-level scale.

As one or more embodiments, the known front-end codes and the corresponding known front-end images in the GitHub code library, where corresponding refers to a one-to-one correspondence between the front-end codes and the front-end images.

As one or more embodiments, a known front-end image is input into a convolutional neural network, and image features of the known front-end image are extracted, wherein feature values of the extracted image comprise: RGB (RGB color mode) characteristic values and HSV (Hexcone Model) characteristic values.

As one or more embodiments, the block-level scale in the long-short term memory network LSTM of the block-level scale limits the output in the block-level scale LSTM, where the limited output is a block-level scale code, so as to divide the front-end image region structure, that is, train the scale LSTM to generate a corresponding interface structure element block-level code.

For one or more embodiments, the label-level scale in the long-short term memory network LSTM of the label-level scale limits the output in the label-level LSTM to a label-level scale code to identify different labels, i.e., trains the scale LSTM to generate a corresponding web content element label code.

For one or more embodiments, the attribute-level scale in the long-short term memory network LSTM of the attribute-level scale limits the output in the attribute-level scale LSTM, and the output is limited to the attribute-level scale code to determine the attribute of the tag, i.e., the attribute for training the scale LSTM to generate the tag code.

As one or more embodiments, the first set weight refers to a weight that determines an input of the tag-level scale LSTM, the weight value is determined by a hidden layer output vector in the block-level scale LSTM, and a sum of the respective weight values is 1.

As one or more embodiments, the second set weight refers to a weight that determines an input of the attribute-level scale LSTM, the weight value is determined by a hidden-layer output vector in the tag-level scale LSTM, and a sum of the respective weight values is 1.

As one or more embodiments, the method for embedding the attribute-level front-end code into the tag-level front-end code includes: the tag level metric LSTM outputs tag information (e.g., < img/>), determines an embedding location according to the tag information, inserts attribute level metric output attribute information (e.g., < src = "/i/eg _ tulip. Jpg") into the tag name, and forms an embedded code (e.g., < imgsrc = "/i/eg _ tulip. Jpg"/>).

As one or more embodiments, the modified tag-level front-end code is embedded into the block-level front-end code, and the specific method is as follows: the block level scale LSTM outputs block level information (such as < Section > … </Section >), determines an embedding position according to the block information, inserts tag information (such as < img/>) output by a tag level scale into a corresponding block level start tag (such as < Section >), and then forms an embedded code (such as < Section > < img/>).

The invention relates to a front-end code generation method based on multi-scale feature decoding, which comprises the following steps: a preprocessing module, a training module, and a prediction module, wherein: the method comprises the following steps that a preprocessing module reads in a predefined Domain-Specific Language (DSL) in a front-end code library, the DSL is a computer Language specially designed for solving a certain kind of tasks, the DSL is designed or used for describing a code structure and content information corresponding to a front-end interface picture, and is encoded to obtain a corresponding feature vector for a training module, and the training module comprises the following steps: a Convolutional Neural Network (CNN) for extracting characteristic values of front-end images and a Long Short-Term Memory Network (LSTM) for decoding multi-scale characteristics to obtain hierarchical DSL, wherein the Convolutional Neural Network extracts the characteristics of the front-end images in a front-end code library by using a deep Convolutional Network; the long-short term memory network trains the DSL characteristic vector obtained by the preprocessing module and the image characteristic vector extracted by the convolutional neural network to obtain neural network parameters from the image characteristic vector to the DSL characteristic vector, the prediction module inputs the front-end image into the convolutional neural network according to the parameters obtained by the training module to extract an image characteristic value, and then the image characteristic value is input into the long-short term memory network to be decoded to generate a corresponding target DSL.

The front-end code base is a front-end code and a corresponding front-end image which are open sources on the GitHub.

The domain specific language is a simple lightweight language designed, so that the code complexity is reduced, the neural network training process is accelerated, and the model performance is better.

The multi-scale feature decoding is realized by a layered long-short term memory network, and comprises three different scales, namely a block level, a label level and an attribute level.

The embodiment comprises the following steps: a preprocessing module, a training module, and a prediction module, wherein: the method comprises the steps that a pre-processing module reads DSL in a front-end code base to carry out one-hot coding, coded feature vectors are obtained and used for a training module, the front-end code base comprises front-end images and corresponding front-end DSL, the front-end DSL is coded to obtain the feature vectors which are input into the training module, and the front-end images are input into the training module; the training module consists of two sub-modules: the encoder module is used for extracting the image characteristic value and extracting and training the image characteristic by using a convolutional neural network; a decoder module for decoding from image characteristics to DSL characteristics, which applies a layered long-short term memory network to perform decoding training of multi-scale characteristics and divides the DSL into three scales of block level, label level and attribute level; and the prediction module inputs the front-end image into the convolutional neural network according to the neural network parameters obtained by the training module to extract an image characteristic value, and then inputs the image characteristic value into three long-term and short-term memory networks with different scales for decoding to generate a corresponding target DSL.

The multi-scale DSL is shown in fig. 1 and respectively includes a Block level (Block), a Label level (Label) and an Attribute level (Attribute).

The pre-processing module performs one-hot coding for the DSL, specifically: all labels in the DSL are expressed by using binary vectors, and the dimension of the vectors is the number of all labels, wherein the label of the ith is expressed by that the index of the ith is marked as a value 1, and other indexes are marked as values 0, namely {0, …,0,1,0, … }.

The training module is shown in fig. 2, and the front-end code generating method based on multi-scale feature decoding specifically includes:

step (1): the method comprises the following steps of extracting visual features of an image, realizing the encoding of the image, and comprising the following steps:

the extraction of image features is performed by a convolutional neural network as an encoder, as shown in fig. 3.

Specifically, a 50-layer depth residual network (ResNet) is used to perform image coding, and the specific process is as follows:

the input image is preprocessed to be 224 × 224 pixels and normalized, and the RGB three channels of the image are taken, that is, the pixel matrix of the input image is 224 × 224 × 3. After the convolution operation of the residual block, an image feature matrix of 1 × 1 × 2048 is finally extracted for inputting to a next decoder for decoding.

Step (2): and training the multi-scale feature decoder to realize the decoding work from the front-end image to the front-end DSL. In the present invention, as shown in fig. 1, the target code is divided into a block level, a tag level and an attribute level, and the work of the decoder is implemented corresponding to three layers of long and short term memory networks, respectively, the contents are as follows:

step (2-1): training a Block LSTM decoder to decode Block-level codes, and the specific process is as follows:

inputting the DSL characteristics obtained by preprocessing and the image characteristics obtained by an encoder into a Block LSTM for training, and inputting a gate I when the time step is t _t Forgetting door F _t And an output gate O _t The calculation formula of (2) is as follows:

I _t ＝σ(X _t W _xi +H _t-1 W _hi +b _i )

F _t ＝σ(X _t W _xf +H _t-1 W _hf +b _f )

O _t ＝σ(X _t W _xo +H _t-1 W _ho +b _o )

wherein W _xi 、W _xf 、W _xo 、W _hi 、W _hf And W _ho Are respectively corresponding weights, b _i 、b _f And b _o Are respectively corresponding deviations, H _t-1 For the neural network to imply the layer states, σ is the activation function, with the aim of introducing non-linear factors into the neural network.

Step (2-2): training a Label LSTM decoder to decode Label-level codes, and the specific process is as follows:

obtaining an implicit layer vector H of a code Block by Block LSTM _t Using an attention model to assign a weight α to each feature _it Weight α _it The calculation formula of (2) is as follows:

β _it ＝W _t σ(W _h H _t-1 +b)

wherein W _t And W _h Representing the weight, b representing the deviation, H _t-1 Representing the Block LSTM hidden layer vector.

Pre-processed DSL feature and image feature X 'obtained by using attention model' _t ＝X _t ×α _it Input into Label LSTM for training, and input gate I 'of same Block LSTM at time step t' _t And a forget gate F' _t And an output gate O' _t The calculation formula of (2) is as follows:

I′ _t ＝σ(X′ _t W′ _xi +H′ _t-1 W′ _hi +b′ _i )

F′ _t ＝σ(X′ _t W′ _xf +H′ _t-1 W′ _hf +b′ _f )

O′ _t ＝σ(X′ _t W′ _xo +H′ _t-1 W′ _ho +b′ _o )

step (2-3): training an Attribute LSTM decoder to decode Attribute level codes, which comprises the following specific processes:

obtaining an hidden layer vector H 'of the Label code through Label LSTM' _t Each feature is assigned a weight α 'using an attention machine model' _it Of weight α' _it The calculation formula of (2) is as follows:

β′ _it ＝W′ _t σ(W′ _h H′ _t-1 +b′)

wherein W' _t And W' _h Represents weight, b 'represents deviation, H' _t-1 Representing the Label LSTM hidden layer vector.

The DSL characteristics obtained by preprocessing and the image characteristics X' obtained by using an attention model _t ＝X′ _t ×α′ _it Input into Attribute LSTM for training, similarly, input gate I ″, of Attribute LSTM time step t _t Forget gate F _t And an output gate O _t The calculation formula of (2) is as follows:

I″ _t ＝σ(X′ _t W″ _xi +H″ _t-1 W″ _hi +b″ _i )

F″ _t ＝σ(X″ _t W″ _xf +H″ _t-1 W′ _hf +b″ _f )

O″ _t ＝σ(X″ _t W″ _xo +H″ _t-1 W″ _ho +b″ _o )

the loss function is defined as:

wherein, y _t Indicates the input of the real characteristic value, o ″, at time t _t Indicating the predicted value of LSTM at time t and N indicating the total time step.

The prediction module comprises: the encoder inputs the front-end image into a convolutional neural network to extract image characteristics; and inputting the image characteristics into three long-short term memory networks with different scales to generate a decoder corresponding to the target DSL. The encoder consists of a convolutional neural network, and the decoder consists of three long-term and short-term memory networks. And inputting the front-end image into an encoder module, extracting image features by a convolutional neural network, inputting the image features into a decoder module, generating features of the DSL by the trained long-short term memory network, and decoding the features of the DSL to obtain the front-end DSL.

The second embodiment also provides a front-end code generation system based on multi-scale feature decoding;

The present disclosure also provides an electronic device, which includes a memory, a processor, and a computer instruction stored in the memory and executed on the processor, where when the computer instruction is executed by the processor, each operation in the method is completed, and details are not described herein for brevity.

The electronic device may be a mobile terminal and a non-mobile terminal, the non-mobile terminal includes a desktop computer, and the mobile terminal includes a Smart Phone (Smart Phone, such as Android Phone and IOS Phone), smart glasses, a Smart watch, a Smart band, a tablet computer, a notebook computer, a personal digital assistant, and other mobile internet devices capable of performing wireless communication.

It should be understood that in the present disclosure, the processor may be a central processing unit CPU, but may also be other general purpose processors, a digital signal processor DSP, an application specific integrated circuit ASIC, an off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor, or in a combination of hardware and software modules. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and combines hardware thereof to complete the steps of the method. To avoid repetition, it is not described in detail here. Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a division of one logic function, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The front-end code generation method based on multi-scale feature decoding is characterized by comprising the following steps:

acquiring a front-end image of a front-end code to be generated;

inputting the extracted image characteristic vector into a pre-trained multi-scale long-short term memory network (LSTM), and outputting a target front-end code;

the training process of the convolutional neural network and the multi-scale long and short term memory network LSTM is completed simultaneously, and comprises the following steps:

constructing a convolutional neural network;

inputting the image characteristics of the known front-end image into a long-short term memory network LSTM with a block-level scale; outputting a block-level front-end code by a long-short term memory network (LSTM) with a block-level scale;

calculating a loss function value according to the fused front-end code and a known front-end code corresponding to a known front-end image in a Github code library, if the function value is smaller than a set threshold value, finishing the training, otherwise, continuing the training;

the block-level scale in the long-short term memory network LSTM with the block-level scale limits output in the block-level scale LSTM, and the limited output is a block-level scale code to divide a front-end image region structure, namely, the block-level scale code of a corresponding interface structure element generated by training the scale LSTM;

the label-level scale in the long-short term memory network LSTM of the label-level scale limits the output in the label-level LSTM, and the output is limited to label-level scale codes so as to identify different labels, namely, the scale LSTM is trained to generate corresponding webpage content element label codes;

the attribute-level scale in the long-short term memory network LSTM of the attribute-level scale limits the output in the attribute-level scale LSTM, and the output is limited to be an attribute-level scale code so as to determine the attribute of the label, namely the attribute for training the scale LSTM to generate the label code;

the method for embedding the attribute-level front-end code into the tag-level front-end code comprises the following steps: outputting label information by using a label-level scale LSTM, determining an embedding position according to the label information, and inserting attribute-level scale output attribute information into a label name to form an embedded code;

embedding the modified label-level front-end code into the block-level front-end code, wherein the specific method comprises the following steps: and outputting block level information by using the block level scale LSTM, determining an embedding position according to the block information, and inserting label level scale output label information into a corresponding block level start label to form an embedded code.

2. The method according to claim 1, wherein the block-level scale refers to a structural element label-level code for determining a block layout of the front-end interface in the front-end interface code, and the front-end interface is divided into different regions by the block-level scale code;

the label-level scale is a content element label-level code contained in the block-level code, and a label contained in the block-level code is determined through the label-level scale;

the attribute-level scale refers to the attribute-level code contained in the label-level scale code, and the attribute of the label generated by the label-level scale can be determined through the attribute-level scale.

3. The method as set forth in claim 1, wherein,

the first set weight is the weight for determining the input of the label-level scale LSTM, the weight is determined by the hidden layer output vector in the block-level scale LSTM, and the sum of all the weights is 1;

the second set weight is a weight for determining the input of the attribute level scale LSTM, the weight is determined by the hidden layer output vector in the label level scale LSTM, and the sum of the weight values is 1.

4. The method of claim 1, wherein the modified tag-level front-end code is embedded in the block-level front-end code by: and outputting block level information by using the block level scale LSTM, determining an embedding position according to the block information, and inserting label level scale output label information into a corresponding block level start label to form an embedded code.

5. A front-end code generation system based on multi-scale feature decoding is characterized by comprising the following components:

the front-end code generation module is used for inputting the extracted image characteristic vectors into a pre-trained multi-scale long-short term memory network (LSTM) and outputting target front-end codes;

constructing a convolutional neural network;

multiplying the image characteristics of the known front-end image by a second set weight, and inputting the image characteristics into a long-short term memory network (LSTM) with an attribute level scale; outputting attribute-level front-end codes by the long-short term memory network LSTM with attribute-level scale;

the attribute level scale in the long-short term memory network LSTM of the attribute level scale limits the output in the attribute level LSTM, and the output is limited to be an attribute level scale code so as to determine the attribute of the label, namely the attribute for training the scale LSTM to generate the label code;

6. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 4.

7. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 4.