CN115330898A - Improved Swin transform-based magazine, book and periodical advertisement embedding method - Google Patents
Improved Swin transform-based magazine, book and periodical advertisement embedding method Download PDFInfo
- Publication number
- CN115330898A CN115330898A CN202211017879.1A CN202211017879A CN115330898A CN 115330898 A CN115330898 A CN 115330898A CN 202211017879 A CN202211017879 A CN 202211017879A CN 115330898 A CN115330898 A CN 115330898A
- Authority
- CN
- China
- Prior art keywords
- layer
- page
- text
- magazine
- att
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000004364 calculation method Methods 0.000 claims abstract description 33
- 238000007781 pre-processing Methods 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 12
- 210000002569 neuron Anatomy 0.000 claims description 9
- 230000000903 blocking effect Effects 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 abstract description 2
- 230000009471 action Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 241000287107 Passer Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Marketing (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
The invention discloses a text advertisement embedding method based on an improved SwinTransformer, which comprises a to-be-fused magazine text page, a to-be-fused magazine advertisement page, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iteration updating layer and a fusion output layer; the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer; the invention belongs to the technical field of image processing, and particularly relates to a text advertisement embedding method based on an improved SwinTransformer; the problem that magazine advertisements are fused with magazine texts can be effectively solved, and the utilization rate of paper is effectively improved; the invention provides an automatic magazine advertisement embedding processing mode, which saves manual labor; the improved SwinTransformer is used, so that parallel computation and distributed computation can be conveniently realized, and the data processing speed is increased; the invention can be conveniently built by using a Pythrch or Tensorflow; there is no training phase and iterative updating can be performed directly.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a text advertisement embedding method based on an improved Swin transform.
Background
The magazine advertisement refers to an advertisement published in a magazine. The magazine advertisement has the advantages of strong pertinence, long retention time, numerous passers, good picture printing effect and the like; the professional magazine arranges corresponding reading contents aiming at different reader objects, so that the professional magazine can be welcomed by different reader objects; the specialization trend of magazines is also developing rapidly, such as medical magazines, science magazines, various technical magazines and the like, the issue objects of the magazines are specific social classes or groups, and the professional magazines have fixed reader levels, so that the advertisement propaganda can be deeply promoted in a professional industry.
The front cover page, the inner page and the insert page of the magazine can be used for advertising, the position of the advertisement can be flexibly arranged, the advertisement content can be highlighted, and the reading interest of a reader can be stimulated; meanwhile, for the arrangement of the advertisement content, various skill changes can be made, such as folding, inserting, page connection, deformation and the like, so as to attract the attention of readers; at present, magazine advertisements are often individually paged, and the magazine advertisements occupy a large space, and according to statistics, the advertisements on a professional magazine occupy more than fifteen percent of the total page number on average.
The fusion of magazine advertisements and magazine texts relates to the field of image processing, and the traditional method needs a professional drawing person to use a professional drawing tool to fuse the magazine advertisements into the magazine texts.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a text advertisement embedding method based on an improved Swin Transformer, which can effectively solve the following problems:
(1) The magazine advertisement and the magazine text are fused, so that the utilization rate of paper is effectively improved;
(2) The traditional image fusion needs the manual processing of professional personnel, and the problem of inconsistent effect exists;
(3) The technical field designs the image processing field, while the traditional image fusion mode adopts a convolutional neural network, and when the size of a processed image is large, the computation complexity is exponentially increased.
Specifically, the invention adopts a technical scheme that: a text advertisement embedding method based on an improved Swin transform comprises a to-be-fused magazine text page, a to-be-fused magazine advertisement page, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iterative updating layer and a fusion output layer: the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer:
further, the journal Text Page to be fused is a content Page of a paper or journal article except an advertisement in the journal, the file format is doc file during initial typesetting, subsequent processing needs to be performed to convert the format into a JPG format or a PNG format, and the dimension reshape of the journal Text Page to be fused is operated to obtain a dimension H × W × C which is marked as Text _ Page.
Preferably, the to-be-fused magazine advertisement Page is an advertisement Page in a magazine, and needs to be converted into a JPG format or a PNG format when performing subsequent processing, and the dimension reshape of the to-be-fused magazine advertisement Page is operated to obtain a dimension H × W × C, which is denoted as Ad _ Page.
Further, preprocessing operations of Text _ Page and Ad _ Page data at a data preprocessing layer, including blocking operations, flattening operations and merging operations:
(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are H multiplied by W multiplied by C, and small square regions with the number of N are obtained after blocking, wherein the size of the small square regions is as follows:
P×P×C
the number N of the small square areas is as follows:
(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 x (P multiplied by C);
(3) Merging operation, namely merging the vectors flattened by the N small squares to obtain a matrix X, wherein the dimensionality of the matrix X is Nx (P multiplied by C), and the form is as follows:
X=[x 1 ,x 2 ,…,x N ] T
the result of the Text _ Page passing through the data preprocessing layer is recorded as X Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X Ad 。
Preferably, the self-attention layer is used for comparing the correlation between each small square area and obtaining an abstract semantic feature, so that the problem of insufficient computation caused by an excessive information amount can be solved, and the specific computation steps are as follows:
s1, generating characteristic matrixes L, M and N of which the value range of each component is-1 to 1, and setting the characteristic matrixes L, M and N as unchangeable, wherein the characteristic matrixes L, M and N are in the following forms:
L=[l 1 ,l 2 ,…,l N ] T
M=[m 1 ,m 2 ,…,m N ] T
N=[n 1 ,n 2 ,…,n N ] T
wherein the dimension of each component of the feature matrices L, M and N is (P × C) × 1;
s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:
IN=X×L T
K=X×M T
V=X×N T
wherein the content of the first and second substances,
IN=[in 1 ,in 2 ,…,in N ] T
K=[k 1 ,k 2 ,…,k N ] T
V=[v 1 ,v 2 ,…,v N ] T
s3, calculating attention distribution, wherein the specific calculation formula is as follows:
weighted averaging of the input information according to the attention distribution:
in the above formula, att i Dimension of is N × 1.
Furthermore, the feedforward network layer comprises N BP neural networks, each BP neural network comprises a feedforward input layer, middle hidden layers and a feedforward output layer, each feedforward input layer comprises N neurons, each middle hidden layer comprises P × C neurons, and each feedforward output layer comprises P neurons; the inputs of the feed-forward input layer are Att respectively 1 、Att 2 、...、Att N-1 And Att N Separately adding Att 1 、Att 2 、...、Att N-1 And Att N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F 1 、F 2 、...、F N-1 And F N The specific calculation steps are as follows:
F i =max(W 1 Att i +b 1 )W 2 +b 2 i∈(1,2,…,N)
in the above formula, b 1 Representing the bias of the hidden layer in the middle, b 2 Representing the bias of the feed-forward output layer, W 1 Inner star weight vector, W, of intermediate hidden layer 2 Is the inner star weight vector of the feed-forward output layer, of which b 1 、b 2 、W 1 And W 2 Set as untrained, F i For the output of each BP neural network, in particular F 1 、F 2 、...、F N-1 And F N All dimensions are P × 1.
Preferably, the attention loss calculation layer is configured to calculate a difference between a feed-forward output of Text _ Page and a feed-forward output of Ad _ Page, and a specific calculation formula is as follows:
in the above formula, F Text Feed-forward output representing Text _ Page, F Ad Represents the feed forward output of Ad _ Page.
Furthermore, the iterative update layer performs iterative update on the Text _ Page by using a gradient descent algorithm to obtain the image Pic, wherein the parameters L, M, N and b in the image data coding layer 1 、b 2 、W 1 And W 2 All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:
in the above formula, X Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:
preferably, the data of the fusion output layer is composed of two parts including an image Pic and an X obtained by subjecting a Text _ Page to a data preprocessing layer Text The calculation steps of the fusion output layer are as follows:
C=μ*Pic+ξ*X Text
the unfolding is as follows:
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and it is the final result that transcribes it into an image.
By adopting the method, the invention has the following advantages:
(1) The magazine advertisement and the magazine text are fused, so that the utilization rate of paper is effectively improved;
(2) The text advertisement embedding method based on the improved Swin Transformer is an automatic magazine advertisement embedding processing mode, and labor force is saved.
(3) The method uses the improved Swin transform to replace the traditional image processing mode based on the convolutional neural network, can conveniently realize parallel computation and distributed computation, and accelerates the data processing speed;
(4) The text advertisement embedding method based on the improved Swin transform can be conveniently constructed by using Pythrch or Tensorflow;
(5) Due to the parameters L, M, N, b in the image data coding layer 1 、b 2 、W 1 And W 2 All are fixed values, so that the method is different from the traditional Swin transform, does not have a training stage, and can directly carry out iterative updating.
Drawings
FIG. 1 is a flowchart illustrating a method for embedding text advertisements based on an improved Swin transform according to the present invention;
FIG. 2 is a flow chart of the computation of the data pre-processing layer proposed by the present invention;
FIG. 3 is a flowchart of a computation of an image data encoding layer;
fig. 4 is a schematic diagram of a calculation method of the attention loss calculation layer.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present solution.
Examples
With reference to fig. 1 to 4, the present embodiment provides a text advertisement embedding method based on an improved Swin Transformer, including a journal text page to be fused, a journal advertisement page to be fused, a data preprocessing layer, an image data encoding layer, an attention loss calculation layer, an iterative update layer, and a fusion output layer: the image data encoding layer includes a multi-headed self-attention layer and a feed-forward network layer.
The method comprises the steps that a to-be-fused magazine Text Page is a content Page of a paper or journal article except an advertisement in a magazine, the file format is a doc file during initial typesetting, subsequent processing is carried out, the format needs to be converted into a JPG format or a PNG format, and the dimension reshape of the to-be-fused magazine Text Page is operated to obtain a dimension of 208 x 288 x 3 and is marked as Text _ Page.
The to-be-fused magazine advertisement Page is an advertisement Page in a magazine, and needs to be converted into a JPG format or a PNG format during subsequent processing, and the to-be-fused magazine advertisement Page is subjected to reshape operation to obtain a size of 208 × 288 × 3, which is recorded as Ad _ Page.
The data preprocessing layer preprocesses Text _ Page and Ad _ Page data, including blocking operation, flattening operation and merging operation:
(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are both 208 multiplied by 288 multiplied by 3, small square regions with the number of 234 are obtained after blocking, and the sizes of the small square regions are as follows:
16×16×3
the number N of the small square areas is as follows:
(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 multiplied by 768;
(3) Merging operation, namely merging the vectors flattened by the 234 small squares to obtain a matrix X, wherein the dimensionality of the matrix X is 234 multiplied by 768, and the form is as follows:
X=[x 1 ,x 2 ,…,x N ] T
the result of the Text _ Page passing through the data preprocessing layer is recorded as X Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X Ad 。
The self-attention layer is used for comparing the correlation between each small square area and obtaining abstract semantic features, can solve the problem of insufficient computing power caused by overlarge information quantity, and comprises the following specific computing steps:
s1, generating feature matrixes L, M and N with the value range of each component between-1 and 1, and setting the feature matrixes L, M and N as unchangeable, wherein the feature matrixes L, M and N are in the following forms:
L=[l 1 ,l 2 ,…,l N ] T
M=[m 1 ,m 2 ,…,m N ] T
N=[n 1 ,n 2 ,…,n N ] T
the dimensionality of each component of the feature matrixes L, M and N is 768 multiplied by 1;
s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:
IN=X×L T
K=X×M T
V=X×N T
wherein the content of the first and second substances,
IN=[in 1 ,in 2 ,…,in N ] T
K=[k 1 ,k 2 ,…,k N ] T
V=[v 1 ,v 2 ,…,v N ] T
s3, calculating attention distribution, wherein a specific calculation formula is as follows:
weighted averaging of the input information according to the attention distribution:
in the above formula, att i Dimension of is 234 × 1.
The feedforward network layer comprises 234 BP neural networks, the BP neural networks comprise a feedforward input layer, middle hidden layers and a feedforward output layer, the feedforward input layer comprises 234 neurons, the middle hidden layers comprise 48 neurons, and the feedforward output layer comprises 16 neurons; the inputs of the feed-forward input layer are Att respectively 1 、Att 2 、...、Att N-1 And Att N Separately adding Att 1 、Att 2 、...、Att N-1 And Att N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F 1 、F 2 、...、F N-1 And F N The specific calculation steps are as follows:
F i =max(W 1 Att i +b 1 )W 2 +b 2 i∈(1,2,…,234)
in the above formula, b 1 Representing the bias of the hidden layer in the middle, b 2 Representing the bias of the feed-forward output layer, W 1 Inner star weight vector, W, of the middle hidden layer 2 Is the inner star weight vector of the feedforward output layer, wherein b 1 、b 2 、W 1 And W 2 Set as untrained, F i For the output of each BP neural network, specifically F 1 、F 2 、...、F N-1 And F N The dimensions are 16 × 1.
The attention loss calculation layer is used for calculating the difference between the feed-forward output of the Text _ Page and the feed-forward output of the Ad _ Page, and the specific calculation formula is as follows:
in the above formula, F Text Feed-forward output representing Text _ Page, F Ad Represents the feed forward output of Ad _ Page.
The iterative update layer carries out iterative update on the Text _ Page by using a gradient descent algorithm to obtain an image Pic, wherein the parameters L, M, N and b in the image data coding layer 1 、b 2 、W 1 And W 2 All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:
in the above formula, X Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:
the data of the fusion output layer consists of two parts, including an image Pic and an X obtained by processing a Text _ Page through a data pre-processing layer Text The calculation steps of the fusion output layer are as follows:
C=μ*Pic+ξ*X Text
the expansion is as follows:
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image according to the rule of the data preprocessing layer, namely the final result.
The first embodiment is as follows:
s1, converting the Text Page of the magazine and the advertisement Page of the magazine into a JPG format or a PNG format, and compressing the Text Page of the magazine and the advertisement Page of the magazine to the same size of 208 multiplied by 288 multiplied by 3 which are respectively marked as Text _ Page and Ad _ Page.
S2, performing data preprocessing operation on the Text _ Page and the Ad _ Page in a data preprocessing layer:
wherein the result obtained by the Text _ Page passing through the data preprocessing layer is X Text Ad _ Page passed through the data Pre-processing layer as result X Ad 。
S4, mixing X Text And X Ad Inputting the data into the self-attention layer to obtain respective abstract semantic features, namely:
in the above equation, the dimension of each component is 16 × 1.
S5, calculating X by using attention loss calculation layer Text Feed forward output of (2) and X Ad The specific calculation formula of the difference between the feedforward outputs is as follows:
s6, using iteration to update layer pair X Text And carrying out iterative update to obtain an image Pic, wherein a specific update calculation formula is as follows:
in the above formula, λ is the learning rate, and the final update result is the image Pic, which is in the form:
s7, fusing the data of the output layer and forming the data by two parts including images Pic and X Text The calculation steps of the fusion output layer are as follows:
C=μ*Pic+ξ*X Text
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image according to the rule of the data preprocessing layer, namely the final result.
The specific working process of the invention is described above, and the steps are repeated when the device is used next time.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A text advertisement embedding method based on an improved Swin transform is characterized by comprising the following steps: the magazine fusion system comprises a magazine text page to be fused, a magazine advertisement page to be fused, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iteration updating layer and a fusion output layer; the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer; the method comprises the steps that a to-be-fused magazine Text Page is a content Page of a paper or periodical article except an advertisement in a magazine, the file format is a doc file during initial typesetting, subsequent processing is carried out, the format needs to be converted into a JPG format or a PNG format, and the dimension reshape of the to-be-fused magazine Text Page is operated to obtain a dimension H multiplied by W multiplied by C which is marked as Text _ Page; the magazine advertisement Page to be fused is an advertisement Page in a magazine, the magazine advertisement Page needs to be converted into a JPG format or a PNG format during subsequent processing, and the dimension reshape of the magazine advertisement Page to be fused is operated to obtain the dimension H multiplied by W multiplied by C which is marked as Ad _ Page.
2. The method of claim 1, wherein the method comprises the following steps: the data preprocessing layer preprocesses the data of the Text _ Page and the Ad _ Page and comprises the following steps:
(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are H multiplied by W multiplied by C, and small square regions with the number of N are obtained after blocking, wherein the size of the small square regions is as follows:
P×P×C
the number N of the small square areas is as follows:
(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 x (P multiplied by C);
(3) Merging operation, namely merging vectors of flattened N small squares to obtain a matrix X, wherein the dimensionality of the matrix X is Nx (P multiplied by C), and the form of the matrix X is as follows:
X=[x 1 ,x 2 ,…,x N ] T
the result of the Text _ Page passing through the data preprocessing layer is marked as X Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X Ad 。
3. The method of claim 2, wherein the method comprises the following steps: the self-attention layer is used for comparing the correlation between each small square area and obtaining abstract semantic features, can solve the problem of insufficient calculation power caused by overlarge information quantity, and comprises the following calculation steps:
s1, generating feature matrixes L, M and N with the value range of each component between-1 and 1, and setting the feature matrixes L, M and N as unchangeable, wherein the feature matrixes L, M and N are in the following forms:
L=[l 1 ,l 2 ,…,l N ] T
M=[m 1 ,m 2 ,…,m N ] T
N=[n 1 ,n 2 ,…,n N ] T
wherein the dimension of each component of the feature matrices L, M and N is (P × C) × 1;
s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:
IN=X×L T
K=X×M T
V=X×N T
wherein:
IN=[in 1 ,in 2 ,…,in N ] T
K=[k 1 ,k 2 ,…,k N ] T
V=[v 1 ,v 2 ,…,v N ] T
s3, calculating attention distribution, wherein the specific calculation formula is as follows:
weighted averaging of the input information according to the attention distribution:
in the above formula, att i Dimension of is N × 1.
4. The method of claim 3, wherein the method comprises the following steps: the feedforward network layer comprises N BP neural networks, each BP neural network comprises a feedforward input layer, middle hidden layers and feedforward output layers, each feedforward input layer comprises N neurons, each middle hidden layer comprises P × C neurons, and each feedforward output layer comprises P neurons; the inputs to the feed-forward input layer are Att, respectively 1 、Att 2 、...、Att N-1 And Att N Separately adding Att 1 、Att 2 、...、Att N-1 And Att N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F 1 、F 2 、...、F N-1 And F N The specific calculation steps are as follows:
F i =max(W 1 Att i +b 1 )W 2 +b 2 i∈(1,2,…,N)
in the above formula, b 1 Representing the bias of the hidden layer in the middle, b 2 Representing the bias of the feed-forward output layer, W 1 Inner star weight vector, W, of the middle hidden layer 2 Is the inner star weight vector of the feedforward output layer, wherein b 1 、b 2 、W 1 And W 2 Set as untrained, F i The dimension of each BP neural network is P multiplied by 1.
5. The method of claim 4, wherein the method comprises the following steps: the attention loss calculation layer is used for calculating the difference between the feed-forward output of the Text _ Page and the feed-forward output of the Ad _ Page, and the specific calculation formula is as follows:
in the above formula, F Text Feed-forward output representing Text _ Page, F Ad Represents the feed forward output of Ad _ Page.
6. The method of claim 5, wherein the method comprises the following steps: the iterative update layer carries out iterative update on the Text _ Page by utilizing a gradient descent algorithm to obtain an image Pic, and parameters L, M, N and b in the image data coding layer 1 、b 2 、W 1 And W 2 All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:
in the above formula, X Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:
in the above formula, dimension of Pic and X Text Are the same.
7. The method of claim 6, wherein the method comprises the following steps: the data of the fusion output layer consists of two parts, including an image Pic and an X obtained by processing a Text _ Page through a data preprocessing layer Text Said fusion output layerThe calculation steps are as follows:
C=μ*Pic+ξ*X Text
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image, namely, the final result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211017879.1A CN115330898B (en) | 2022-08-24 | 2022-08-24 | Magazine advertisement embedding method based on improved Swin Transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211017879.1A CN115330898B (en) | 2022-08-24 | 2022-08-24 | Magazine advertisement embedding method based on improved Swin Transformer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115330898A true CN115330898A (en) | 2022-11-11 |
CN115330898B CN115330898B (en) | 2023-06-06 |
Family
ID=83926419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211017879.1A Active CN115330898B (en) | 2022-08-24 | 2022-08-24 | Magazine advertisement embedding method based on improved Swin Transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115330898B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785409A (en) * | 2018-12-29 | 2019-05-21 | 武汉大学 | A kind of image based on attention mechanism-text data fusion method and system |
CN113313201A (en) * | 2021-06-21 | 2021-08-27 | 南京挥戈智能科技有限公司 | Multi-target detection and distance measurement method based on Swin transducer and ZED camera |
CN113609965A (en) * | 2021-08-03 | 2021-11-05 | 同盾科技有限公司 | Training method and device of character recognition model, storage medium and electronic equipment |
CN113658057A (en) * | 2021-07-16 | 2021-11-16 | 西安理工大学 | Swin transform low-light-level image enhancement method |
CN113709455A (en) * | 2021-09-27 | 2021-11-26 | 北京交通大学 | Multilevel image compression method using Transformer |
CN114283347A (en) * | 2022-03-03 | 2022-04-05 | 粤港澳大湾区数字经济研究院(福田) | Target detection method, system, intelligent terminal and computer readable storage medium |
CN114528912A (en) * | 2022-01-10 | 2022-05-24 | 山东师范大学 | False news detection method and system based on progressive multi-mode converged network |
CN114550158A (en) * | 2022-02-23 | 2022-05-27 | 厦门大学 | Scene character recognition method and system |
CN114743020A (en) * | 2022-04-02 | 2022-07-12 | 华南理工大学 | Food identification method combining tag semantic embedding and attention fusion |
CN114821239A (en) * | 2022-05-10 | 2022-07-29 | 安徽农业大学 | Method for detecting plant diseases and insect pests in foggy environment |
CN114841977A (en) * | 2022-05-17 | 2022-08-02 | 南京信息工程大学 | Defect detection method based on Swin Transformer structure combined with SSIM and GMSD |
CN114898219A (en) * | 2022-07-13 | 2022-08-12 | 中国标准化研究院 | SVM-based manipulator touch data representation and identification method |
CN114912575A (en) * | 2022-04-06 | 2022-08-16 | 西安交通大学 | Medical image segmentation model and method based on Swin transform connection path |
CN114912461A (en) * | 2022-05-31 | 2022-08-16 | 浙江工业大学 | Deep learning-based Chinese text classification method |
-
2022
- 2022-08-24 CN CN202211017879.1A patent/CN115330898B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785409A (en) * | 2018-12-29 | 2019-05-21 | 武汉大学 | A kind of image based on attention mechanism-text data fusion method and system |
CN113313201A (en) * | 2021-06-21 | 2021-08-27 | 南京挥戈智能科技有限公司 | Multi-target detection and distance measurement method based on Swin transducer and ZED camera |
CN113658057A (en) * | 2021-07-16 | 2021-11-16 | 西安理工大学 | Swin transform low-light-level image enhancement method |
CN113609965A (en) * | 2021-08-03 | 2021-11-05 | 同盾科技有限公司 | Training method and device of character recognition model, storage medium and electronic equipment |
CN113709455A (en) * | 2021-09-27 | 2021-11-26 | 北京交通大学 | Multilevel image compression method using Transformer |
CN114528912A (en) * | 2022-01-10 | 2022-05-24 | 山东师范大学 | False news detection method and system based on progressive multi-mode converged network |
CN114550158A (en) * | 2022-02-23 | 2022-05-27 | 厦门大学 | Scene character recognition method and system |
CN114283347A (en) * | 2022-03-03 | 2022-04-05 | 粤港澳大湾区数字经济研究院(福田) | Target detection method, system, intelligent terminal and computer readable storage medium |
CN114743020A (en) * | 2022-04-02 | 2022-07-12 | 华南理工大学 | Food identification method combining tag semantic embedding and attention fusion |
CN114912575A (en) * | 2022-04-06 | 2022-08-16 | 西安交通大学 | Medical image segmentation model and method based on Swin transform connection path |
CN114821239A (en) * | 2022-05-10 | 2022-07-29 | 安徽农业大学 | Method for detecting plant diseases and insect pests in foggy environment |
CN114841977A (en) * | 2022-05-17 | 2022-08-02 | 南京信息工程大学 | Defect detection method based on Swin Transformer structure combined with SSIM and GMSD |
CN114912461A (en) * | 2022-05-31 | 2022-08-16 | 浙江工业大学 | Deep learning-based Chinese text classification method |
CN114898219A (en) * | 2022-07-13 | 2022-08-12 | 中国标准化研究院 | SVM-based manipulator touch data representation and identification method |
Non-Patent Citations (1)
Title |
---|
蒋琪 等: "基于Transformer的汉字到盲文端到端自动转换" * |
Also Published As
Publication number | Publication date |
---|---|
CN115330898B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112232149B (en) | Document multimode information and relation extraction method and system | |
Chen et al. | Sdae: Self-distillated masked autoencoder | |
CN106569998A (en) | Text named entity recognition method based on Bi-LSTM, CNN and CRF | |
Sprechmann et al. | Supervised sparse analysis and synthesis operators | |
Fei et al. | Low rank representation with adaptive distance penalty for semi-supervised subspace classification | |
CN110263174B (en) | Topic category analysis method based on focus attention | |
CN110210027B (en) | Fine-grained emotion analysis method, device, equipment and medium based on ensemble learning | |
CN112818764A (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
Muppalaneni | Handwritten Telugu compound character prediction using convolutional neural network | |
CN113313173A (en) | Human body analysis method based on graph representation and improved Transformer | |
Nguyen et al. | Discriminative low-rank dictionary learning for face recognition | |
CN114741507B (en) | Introduction network classification model establishment and classification of graph rolling network based on Transformer | |
Dwivedi et al. | A Novel deep learning model for accurate prediction of image captions in fashion industry | |
CN107067373A (en) | A kind of gradient minimisation recovery method of binary image based on 0 norm | |
Khayyat et al. | A deep learning based prediction of arabic manuscripts handwriting style. | |
CN115035531A (en) | Retail terminal character recognition method and system | |
Biradar et al. | Classification of book genres using book cover and title | |
CN111339734B (en) | Method for generating image based on text | |
CN110222222B (en) | Multi-modal retrieval method based on deep topic self-coding model | |
US20230260176A1 (en) | System and method for face swapping with single/multiple source images using attention mechanism | |
CN115330898A (en) | Improved Swin transform-based magazine, book and periodical advertisement embedding method | |
CN113806747B (en) | Trojan horse picture detection method and system and computer readable storage medium | |
Zhang et al. | Analytic separable dictionary learning based on oblique manifold | |
CN115797642A (en) | Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field | |
Wu et al. | Transformer Autoencoder for K-means Efficient clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240116 Address after: Room 0606, 6th Floor, Building A, Berlin International Business Center, No. 85 Binhe West Road, Wanbailin District, Taiyuan City, Shanxi Province, 030024 Patentee after: Forest Fantasy (Taiyuan) Digital Technology Co.,Ltd. Address before: 048000 Room 302, unit 2, building 5, Agricultural Bank of China residential area, Nancheng District, Xinshi East Street, Jincheng Development Zone, Shanxi Province Patentee before: Jincheng Darui Jinma Engineering Design Consulting Co.,Ltd. |