CN115330898A

CN115330898A - Improved Swin transform-based magazine, book and periodical advertisement embedding method

Info

Publication number: CN115330898A
Application number: CN202211017879.1A
Authority: CN
Inventors: 李宁; 李佳钥; 李风山
Original assignee: Jincheng Darui Jinma Engineering Design Consulting Co ltd
Current assignee: Forest Fantasy (Taiyuan) Digital Technology Co.,Ltd.
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-11
Anticipated expiration: 2042-08-24
Also published as: CN115330898B

Abstract

The invention discloses a text advertisement embedding method based on an improved SwinTransformer, which comprises a to-be-fused magazine text page, a to-be-fused magazine advertisement page, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iteration updating layer and a fusion output layer; the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer; the invention belongs to the technical field of image processing, and particularly relates to a text advertisement embedding method based on an improved SwinTransformer; the problem that magazine advertisements are fused with magazine texts can be effectively solved, and the utilization rate of paper is effectively improved; the invention provides an automatic magazine advertisement embedding processing mode, which saves manual labor; the improved SwinTransformer is used, so that parallel computation and distributed computation can be conveniently realized, and the data processing speed is increased; the invention can be conveniently built by using a Pythrch or Tensorflow; there is no training phase and iterative updating can be performed directly.

Description

Improved Swin transform-based magazine, book and periodical advertisement embedding method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a text advertisement embedding method based on an improved Swin transform.

Background

The magazine advertisement refers to an advertisement published in a magazine. The magazine advertisement has the advantages of strong pertinence, long retention time, numerous passers, good picture printing effect and the like; the professional magazine arranges corresponding reading contents aiming at different reader objects, so that the professional magazine can be welcomed by different reader objects; the specialization trend of magazines is also developing rapidly, such as medical magazines, science magazines, various technical magazines and the like, the issue objects of the magazines are specific social classes or groups, and the professional magazines have fixed reader levels, so that the advertisement propaganda can be deeply promoted in a professional industry.

The front cover page, the inner page and the insert page of the magazine can be used for advertising, the position of the advertisement can be flexibly arranged, the advertisement content can be highlighted, and the reading interest of a reader can be stimulated; meanwhile, for the arrangement of the advertisement content, various skill changes can be made, such as folding, inserting, page connection, deformation and the like, so as to attract the attention of readers; at present, magazine advertisements are often individually paged, and the magazine advertisements occupy a large space, and according to statistics, the advertisements on a professional magazine occupy more than fifteen percent of the total page number on average.

The fusion of magazine advertisements and magazine texts relates to the field of image processing, and the traditional method needs a professional drawing person to use a professional drawing tool to fuse the magazine advertisements into the magazine texts.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a text advertisement embedding method based on an improved Swin Transformer, which can effectively solve the following problems:

(1) The magazine advertisement and the magazine text are fused, so that the utilization rate of paper is effectively improved;

(2) The traditional image fusion needs the manual processing of professional personnel, and the problem of inconsistent effect exists;

(3) The technical field designs the image processing field, while the traditional image fusion mode adopts a convolutional neural network, and when the size of a processed image is large, the computation complexity is exponentially increased.

Specifically, the invention adopts a technical scheme that: a text advertisement embedding method based on an improved Swin transform comprises a to-be-fused magazine text page, a to-be-fused magazine advertisement page, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iterative updating layer and a fusion output layer: the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer:

further, the journal Text Page to be fused is a content Page of a paper or journal article except an advertisement in the journal, the file format is doc file during initial typesetting, subsequent processing needs to be performed to convert the format into a JPG format or a PNG format, and the dimension reshape of the journal Text Page to be fused is operated to obtain a dimension H × W × C which is marked as Text _ Page.

Preferably, the to-be-fused magazine advertisement Page is an advertisement Page in a magazine, and needs to be converted into a JPG format or a PNG format when performing subsequent processing, and the dimension reshape of the to-be-fused magazine advertisement Page is operated to obtain a dimension H × W × C, which is denoted as Ad _ Page.

Further, preprocessing operations of Text _ Page and Ad _ Page data at a data preprocessing layer, including blocking operations, flattening operations and merging operations:

(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are H multiplied by W multiplied by C, and small square regions with the number of N are obtained after blocking, wherein the size of the small square regions is as follows:

P×P×C

the number N of the small square areas is as follows:

(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 x (P multiplied by C);

(3) Merging operation, namely merging the vectors flattened by the N small squares to obtain a matrix X, wherein the dimensionality of the matrix X is Nx (P multiplied by C), and the form is as follows:

X＝[x ₁ ,x ₂ ,…,x _N ] ^T

the result of the Text _ Page passing through the data preprocessing layer is recorded as X ^Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X ^Ad 。

Preferably, the self-attention layer is used for comparing the correlation between each small square area and obtaining an abstract semantic feature, so that the problem of insufficient computation caused by an excessive information amount can be solved, and the specific computation steps are as follows:

s1, generating characteristic matrixes L, M and N of which the value range of each component is-1 to 1, and setting the characteristic matrixes L, M and N as unchangeable, wherein the characteristic matrixes L, M and N are in the following forms:

L＝[l ₁ ,l ₂ ,…,l _N ] ^T

M＝[m ₁ ,m ₂ ,…,m _N ] ^T

N＝[n ₁ ,n ₂ ,…,n _N ] ^T

wherein the dimension of each component of the feature matrices L, M and N is (P × C) × 1;

s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:

IN＝X×L ^T

K＝X×M ^T

V＝X×N ^T

wherein the content of the first and second substances,

IN＝[in ₁ ,in ₂ ,…,in _N ] ^T

K＝[k ₁ ,k ₂ ,…,k _N ] ^T

V＝[v ₁ ,v ₂ ,…,v _N ] ^T

s3, calculating attention distribution, wherein the specific calculation formula is as follows:

weighted averaging of the input information according to the attention distribution:

in the above formula, att _i Dimension of is N × 1.

Furthermore, the feedforward network layer comprises N BP neural networks, each BP neural network comprises a feedforward input layer, middle hidden layers and a feedforward output layer, each feedforward input layer comprises N neurons, each middle hidden layer comprises P × C neurons, and each feedforward output layer comprises P neurons; the inputs of the feed-forward input layer are Att respectively ₁ 、Att ₂ 、...、Att _N-1 And Att _N Separately adding Att ₁ 、Att ₂ 、...、Att _N-1 And Att _N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F ₁ 、F ₂ 、...、F _N-1 And F _N The specific calculation steps are as follows:

F _i ＝max(W ₁ Att _i +b ₁ )W ₂ +b ₂ i∈(1,2,…,N)

in the above formula, b ₁ Representing the bias of the hidden layer in the middle, b ₂ Representing the bias of the feed-forward output layer, W ₁ Inner star weight vector, W, of intermediate hidden layer ₂ Is the inner star weight vector of the feed-forward output layer, of which b ₁ 、b ₂ 、W ₁ And W ₂ Set as untrained, F _i For the output of each BP neural network, in particular F ₁ 、F ₂ 、...、F _N-1 And F _N All dimensions are P × 1.

Preferably, the attention loss calculation layer is configured to calculate a difference between a feed-forward output of Text _ Page and a feed-forward output of Ad _ Page, and a specific calculation formula is as follows:

in the above formula, F ^Text Feed-forward output representing Text _ Page, F ^Ad Represents the feed forward output of Ad _ Page.

Furthermore, the iterative update layer performs iterative update on the Text _ Page by using a gradient descent algorithm to obtain the image Pic, wherein the parameters L, M, N and b in the image data coding layer ₁ 、b ₂ 、W ₁ And W ₂ All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:

in the above formula, X ^Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:

preferably, the data of the fusion output layer is composed of two parts including an image Pic and an X obtained by subjecting a Text _ Page to a data preprocessing layer ^Text The calculation steps of the fusion output layer are as follows:

C＝μ*Pic+ξ*X ^Text

the unfolding is as follows:

in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and it is the final result that transcribes it into an image.

By adopting the method, the invention has the following advantages:

(2) The text advertisement embedding method based on the improved Swin Transformer is an automatic magazine advertisement embedding processing mode, and labor force is saved.

(3) The method uses the improved Swin transform to replace the traditional image processing mode based on the convolutional neural network, can conveniently realize parallel computation and distributed computation, and accelerates the data processing speed;

(4) The text advertisement embedding method based on the improved Swin transform can be conveniently constructed by using Pythrch or Tensorflow;

(5) Due to the parameters L, M, N, b in the image data coding layer ₁ 、b ₂ 、W ₁ And W ₂ All are fixed values, so that the method is different from the traditional Swin transform, does not have a training stage, and can directly carry out iterative updating.

Drawings

FIG. 1 is a flowchart illustrating a method for embedding text advertisements based on an improved Swin transform according to the present invention;

FIG. 2 is a flow chart of the computation of the data pre-processing layer proposed by the present invention;

FIG. 3 is a flowchart of a computation of an image data encoding layer;

fig. 4 is a schematic diagram of a calculation method of the attention loss calculation layer.

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present solution.

Examples

With reference to fig. 1 to 4, the present embodiment provides a text advertisement embedding method based on an improved Swin Transformer, including a journal text page to be fused, a journal advertisement page to be fused, a data preprocessing layer, an image data encoding layer, an attention loss calculation layer, an iterative update layer, and a fusion output layer: the image data encoding layer includes a multi-headed self-attention layer and a feed-forward network layer.

The method comprises the steps that a to-be-fused magazine Text Page is a content Page of a paper or journal article except an advertisement in a magazine, the file format is a doc file during initial typesetting, subsequent processing is carried out, the format needs to be converted into a JPG format or a PNG format, and the dimension reshape of the to-be-fused magazine Text Page is operated to obtain a dimension of 208 x 288 x 3 and is marked as Text _ Page.

The to-be-fused magazine advertisement Page is an advertisement Page in a magazine, and needs to be converted into a JPG format or a PNG format during subsequent processing, and the to-be-fused magazine advertisement Page is subjected to reshape operation to obtain a size of 208 × 288 × 3, which is recorded as Ad _ Page.

The data preprocessing layer preprocesses Text _ Page and Ad _ Page data, including blocking operation, flattening operation and merging operation:

(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are both 208 multiplied by 288 multiplied by 3, small square regions with the number of 234 are obtained after blocking, and the sizes of the small square regions are as follows:

16×16×3

the number N of the small square areas is as follows:

(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 multiplied by 768;

(3) Merging operation, namely merging the vectors flattened by the 234 small squares to obtain a matrix X, wherein the dimensionality of the matrix X is 234 multiplied by 768, and the form is as follows:

X＝[x ₁ ,x ₂ ,…,x _N ] ^T

The self-attention layer is used for comparing the correlation between each small square area and obtaining abstract semantic features, can solve the problem of insufficient computing power caused by overlarge information quantity, and comprises the following specific computing steps:

s1, generating feature matrixes L, M and N with the value range of each component between-1 and 1, and setting the feature matrixes L, M and N as unchangeable, wherein the feature matrixes L, M and N are in the following forms:

L＝[l ₁ ,l ₂ ,…,l _N ] ^T

M＝[m ₁ ,m ₂ ,…,m _N ] ^T

N＝[n ₁ ,n ₂ ,…,n _N ] ^T

the dimensionality of each component of the feature matrixes L, M and N is 768 multiplied by 1;

IN＝X×L ^T

K＝X×M ^T

V＝X×N ^T

wherein the content of the first and second substances,

IN＝[in ₁ ,in ₂ ,…,in _N ] ^T

K＝[k ₁ ,k ₂ ,…,k _N ] ^T

V＝[v ₁ ,v ₂ ,…,v _N ] ^T

s3, calculating attention distribution, wherein a specific calculation formula is as follows:

in the above formula, att _i Dimension of is 234 × 1.

The feedforward network layer comprises 234 BP neural networks, the BP neural networks comprise a feedforward input layer, middle hidden layers and a feedforward output layer, the feedforward input layer comprises 234 neurons, the middle hidden layers comprise 48 neurons, and the feedforward output layer comprises 16 neurons; the inputs of the feed-forward input layer are Att respectively ₁ 、Att ₂ 、...、Att _N-1 And Att _N Separately adding Att ₁ 、Att ₂ 、...、Att _N-1 And Att _N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F ₁ 、F ₂ 、...、F _N-1 And F _N The specific calculation steps are as follows:

F _i ＝max(W ₁ Att _i +b ₁ )W ₂ +b ₂ i∈(1,2,…,234)

in the above formula, b ₁ Representing the bias of the hidden layer in the middle, b ₂ Representing the bias of the feed-forward output layer, W ₁ Inner star weight vector, W, of the middle hidden layer ₂ Is the inner star weight vector of the feedforward output layer, wherein b ₁ 、b ₂ 、W ₁ And W ₂ Set as untrained, F _i For the output of each BP neural network, specifically F ₁ 、F ₂ 、...、F _N-1 And F _N The dimensions are 16 × 1.

The attention loss calculation layer is used for calculating the difference between the feed-forward output of the Text _ Page and the feed-forward output of the Ad _ Page, and the specific calculation formula is as follows:

The iterative update layer carries out iterative update on the Text _ Page by using a gradient descent algorithm to obtain an image Pic, wherein the parameters L, M, N and b in the image data coding layer ₁ 、b ₂ 、W ₁ And W ₂ All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:

the data of the fusion output layer consists of two parts, including an image Pic and an X obtained by processing a Text _ Page through a data pre-processing layer ^Text The calculation steps of the fusion output layer are as follows:

C＝μ*Pic+ξ*X ^Text

the expansion is as follows:

in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image according to the rule of the data preprocessing layer, namely the final result.

The first embodiment is as follows:

s1, converting the Text Page of the magazine and the advertisement Page of the magazine into a JPG format or a PNG format, and compressing the Text Page of the magazine and the advertisement Page of the magazine to the same size of 208 multiplied by 288 multiplied by 3 which are respectively marked as Text _ Page and Ad _ Page.

S2, performing data preprocessing operation on the Text _ Page and the Ad _ Page in a data preprocessing layer:

wherein the result obtained by the Text _ Page passing through the data preprocessing layer is X ^Text Ad _ Page passed through the data Pre-processing layer as result X ^Ad 。

S4, mixing X ^Text And X ^Ad Inputting the data into the self-attention layer to obtain respective abstract semantic features, namely:

in the above equation, the dimension of each component is 16 × 1.

S5, calculating X by using attention loss calculation layer ^Text Feed forward output of (2) and X ^Ad The specific calculation formula of the difference between the feedforward outputs is as follows:

s6, using iteration to update layer pair X ^Text And carrying out iterative update to obtain an image Pic, wherein a specific update calculation formula is as follows:

in the above formula, λ is the learning rate, and the final update result is the image Pic, which is in the form:

s7, fusing the data of the output layer and forming the data by two parts including images Pic and X ^Text The calculation steps of the fusion output layer are as follows:

C＝μ*Pic+ξ*X ^Text

The specific working process of the invention is described above, and the steps are repeated when the device is used next time.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A text advertisement embedding method based on an improved Swin transform is characterized by comprising the following steps: the magazine fusion system comprises a magazine text page to be fused, a magazine advertisement page to be fused, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iteration updating layer and a fusion output layer; the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer; the method comprises the steps that a to-be-fused magazine Text Page is a content Page of a paper or periodical article except an advertisement in a magazine, the file format is a doc file during initial typesetting, subsequent processing is carried out, the format needs to be converted into a JPG format or a PNG format, and the dimension reshape of the to-be-fused magazine Text Page is operated to obtain a dimension H multiplied by W multiplied by C which is marked as Text _ Page; the magazine advertisement Page to be fused is an advertisement Page in a magazine, the magazine advertisement Page needs to be converted into a JPG format or a PNG format during subsequent processing, and the dimension reshape of the magazine advertisement Page to be fused is operated to obtain the dimension H multiplied by W multiplied by C which is marked as Ad _ Page.

2. The method of claim 1, wherein the method comprises the following steps: the data preprocessing layer preprocesses the data of the Text _ Page and the Ad _ Page and comprises the following steps:

P×P×C

the number N of the small square areas is as follows:

(3) Merging operation, namely merging vectors of flattened N small squares to obtain a matrix X, wherein the dimensionality of the matrix X is Nx (P multiplied by C), and the form of the matrix X is as follows:

X＝[x ₁ ,x ₂ ,…,x _N ] ^T

the result of the Text _ Page passing through the data preprocessing layer is marked as X ^Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X ^Ad 。

3. The method of claim 2, wherein the method comprises the following steps: the self-attention layer is used for comparing the correlation between each small square area and obtaining abstract semantic features, can solve the problem of insufficient calculation power caused by overlarge information quantity, and comprises the following calculation steps:

L＝[l ₁ ,l ₂ ,…,l _N ] ^T

M＝[m ₁ ,m ₂ ,…,m _N ] ^T

N＝[n ₁ ,n ₂ ,…,n _N ] ^T

IN＝X×L ^T

K＝X×M ^T

V＝X×N ^T

wherein:

IN＝[in ₁ ,in ₂ ,…,in _N ] ^T

K＝[k ₁ ,k ₂ ,…,k _N ] ^T

V＝[v ₁ ,v ₂ ,…,v _N ] ^T

in the above formula, att _i Dimension of is N × 1.

4. The method of claim 3, wherein the method comprises the following steps: the feedforward network layer comprises N BP neural networks, each BP neural network comprises a feedforward input layer, middle hidden layers and feedforward output layers, each feedforward input layer comprises N neurons, each middle hidden layer comprises P × C neurons, and each feedforward output layer comprises P neurons; the inputs to the feed-forward input layer are Att, respectively ₁ 、Att ₂ 、...、Att _N-1 And Att _N Separately adding Att ₁ 、Att ₂ 、...、Att _N-1 And Att _N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F ₁ 、F ₂ 、...、F _N-1 And F _N The specific calculation steps are as follows:

F _i ＝max(W ₁ Att _i +b ₁ )W ₂ +b ₂ i∈(1,2,…,N)

in the above formula, b ₁ Representing the bias of the hidden layer in the middle, b ₂ Representing the bias of the feed-forward output layer, W ₁ Inner star weight vector, W, of the middle hidden layer ₂ Is the inner star weight vector of the feedforward output layer, wherein b ₁ 、b ₂ 、W ₁ And W ₂ Set as untrained, F _i The dimension of each BP neural network is P multiplied by 1.

5. The method of claim 4, wherein the method comprises the following steps: the attention loss calculation layer is used for calculating the difference between the feed-forward output of the Text _ Page and the feed-forward output of the Ad _ Page, and the specific calculation formula is as follows:

6. The method of claim 5, wherein the method comprises the following steps: the iterative update layer carries out iterative update on the Text _ Page by utilizing a gradient descent algorithm to obtain an image Pic, and parameters L, M, N and b in the image data coding layer ₁ 、b ₂ 、W ₁ And W ₂ All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:

in the above formula, dimension of Pic and X ^Text Are the same.

7. The method of claim 6, wherein the method comprises the following steps: the data of the fusion output layer consists of two parts, including an image Pic and an X obtained by processing a Text _ Page through a data preprocessing layer ^Text Said fusion output layerThe calculation steps are as follows:

C＝μ*Pic+ξ*X ^Text

in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image, namely, the final result.