CN115330898A - Improved Swin transform-based magazine, book and periodical advertisement embedding method - Google Patents

Improved Swin transform-based magazine, book and periodical advertisement embedding method Download PDF

Info

Publication number
CN115330898A
CN115330898A CN202211017879.1A CN202211017879A CN115330898A CN 115330898 A CN115330898 A CN 115330898A CN 202211017879 A CN202211017879 A CN 202211017879A CN 115330898 A CN115330898 A CN 115330898A
Authority
CN
China
Prior art keywords
layer
page
text
magazine
att
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211017879.1A
Other languages
Chinese (zh)
Other versions
CN115330898B (en
Inventor
李宁
李佳钥
李风山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Forest Fantasy (Taiyuan) Digital Technology Co.,Ltd.
Original Assignee
Jincheng Darui Jinma Engineering Design Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jincheng Darui Jinma Engineering Design Consulting Co ltd filed Critical Jincheng Darui Jinma Engineering Design Consulting Co ltd
Priority to CN202211017879.1A priority Critical patent/CN115330898B/en
Publication of CN115330898A publication Critical patent/CN115330898A/en
Application granted granted Critical
Publication of CN115330898B publication Critical patent/CN115330898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a text advertisement embedding method based on an improved SwinTransformer, which comprises a to-be-fused magazine text page, a to-be-fused magazine advertisement page, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iteration updating layer and a fusion output layer; the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer; the invention belongs to the technical field of image processing, and particularly relates to a text advertisement embedding method based on an improved SwinTransformer; the problem that magazine advertisements are fused with magazine texts can be effectively solved, and the utilization rate of paper is effectively improved; the invention provides an automatic magazine advertisement embedding processing mode, which saves manual labor; the improved SwinTransformer is used, so that parallel computation and distributed computation can be conveniently realized, and the data processing speed is increased; the invention can be conveniently built by using a Pythrch or Tensorflow; there is no training phase and iterative updating can be performed directly.

Description

Improved Swin transform-based magazine, book and periodical advertisement embedding method
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a text advertisement embedding method based on an improved Swin transform.
Background
The magazine advertisement refers to an advertisement published in a magazine. The magazine advertisement has the advantages of strong pertinence, long retention time, numerous passers, good picture printing effect and the like; the professional magazine arranges corresponding reading contents aiming at different reader objects, so that the professional magazine can be welcomed by different reader objects; the specialization trend of magazines is also developing rapidly, such as medical magazines, science magazines, various technical magazines and the like, the issue objects of the magazines are specific social classes or groups, and the professional magazines have fixed reader levels, so that the advertisement propaganda can be deeply promoted in a professional industry.
The front cover page, the inner page and the insert page of the magazine can be used for advertising, the position of the advertisement can be flexibly arranged, the advertisement content can be highlighted, and the reading interest of a reader can be stimulated; meanwhile, for the arrangement of the advertisement content, various skill changes can be made, such as folding, inserting, page connection, deformation and the like, so as to attract the attention of readers; at present, magazine advertisements are often individually paged, and the magazine advertisements occupy a large space, and according to statistics, the advertisements on a professional magazine occupy more than fifteen percent of the total page number on average.
The fusion of magazine advertisements and magazine texts relates to the field of image processing, and the traditional method needs a professional drawing person to use a professional drawing tool to fuse the magazine advertisements into the magazine texts.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a text advertisement embedding method based on an improved Swin Transformer, which can effectively solve the following problems:
(1) The magazine advertisement and the magazine text are fused, so that the utilization rate of paper is effectively improved;
(2) The traditional image fusion needs the manual processing of professional personnel, and the problem of inconsistent effect exists;
(3) The technical field designs the image processing field, while the traditional image fusion mode adopts a convolutional neural network, and when the size of a processed image is large, the computation complexity is exponentially increased.
Specifically, the invention adopts a technical scheme that: a text advertisement embedding method based on an improved Swin transform comprises a to-be-fused magazine text page, a to-be-fused magazine advertisement page, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iterative updating layer and a fusion output layer: the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer:
further, the journal Text Page to be fused is a content Page of a paper or journal article except an advertisement in the journal, the file format is doc file during initial typesetting, subsequent processing needs to be performed to convert the format into a JPG format or a PNG format, and the dimension reshape of the journal Text Page to be fused is operated to obtain a dimension H × W × C which is marked as Text _ Page.
Preferably, the to-be-fused magazine advertisement Page is an advertisement Page in a magazine, and needs to be converted into a JPG format or a PNG format when performing subsequent processing, and the dimension reshape of the to-be-fused magazine advertisement Page is operated to obtain a dimension H × W × C, which is denoted as Ad _ Page.
Further, preprocessing operations of Text _ Page and Ad _ Page data at a data preprocessing layer, including blocking operations, flattening operations and merging operations:
(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are H multiplied by W multiplied by C, and small square regions with the number of N are obtained after blocking, wherein the size of the small square regions is as follows:
P×P×C
the number N of the small square areas is as follows:
Figure BDA0003812916980000021
(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 x (P multiplied by C);
(3) Merging operation, namely merging the vectors flattened by the N small squares to obtain a matrix X, wherein the dimensionality of the matrix X is Nx (P multiplied by C), and the form is as follows:
X=[x 1 ,x 2 ,…,x N ] T
the result of the Text _ Page passing through the data preprocessing layer is recorded as X Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X Ad
Preferably, the self-attention layer is used for comparing the correlation between each small square area and obtaining an abstract semantic feature, so that the problem of insufficient computation caused by an excessive information amount can be solved, and the specific computation steps are as follows:
s1, generating characteristic matrixes L, M and N of which the value range of each component is-1 to 1, and setting the characteristic matrixes L, M and N as unchangeable, wherein the characteristic matrixes L, M and N are in the following forms:
L=[l 1 ,l 2 ,…,l N ] T
M=[m 1 ,m 2 ,…,m N ] T
N=[n 1 ,n 2 ,…,n N ] T
wherein the dimension of each component of the feature matrices L, M and N is (P × C) × 1;
s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:
IN=X×L T
K=X×M T
V=X×N T
wherein the content of the first and second substances,
IN=[in 1 ,in 2 ,…,in N ] T
K=[k 1 ,k 2 ,…,k N ] T
V=[v 1 ,v 2 ,…,v N ] T
s3, calculating attention distribution, wherein the specific calculation formula is as follows:
Figure BDA0003812916980000031
weighted averaging of the input information according to the attention distribution:
Figure BDA0003812916980000032
in the above formula, att i Dimension of is N × 1.
Furthermore, the feedforward network layer comprises N BP neural networks, each BP neural network comprises a feedforward input layer, middle hidden layers and a feedforward output layer, each feedforward input layer comprises N neurons, each middle hidden layer comprises P × C neurons, and each feedforward output layer comprises P neurons; the inputs of the feed-forward input layer are Att respectively 1 、Att 2 、...、Att N-1 And Att N Separately adding Att 1 、Att 2 、...、Att N-1 And Att N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F 1 、F 2 、...、F N-1 And F N The specific calculation steps are as follows:
F i =max(W 1 Att i +b 1 )W 2 +b 2 i∈(1,2,…,N)
in the above formula, b 1 Representing the bias of the hidden layer in the middle, b 2 Representing the bias of the feed-forward output layer, W 1 Inner star weight vector, W, of intermediate hidden layer 2 Is the inner star weight vector of the feed-forward output layer, of which b 1 、b 2 、W 1 And W 2 Set as untrained, F i For the output of each BP neural network, in particular F 1 、F 2 、...、F N-1 And F N All dimensions are P × 1.
Preferably, the attention loss calculation layer is configured to calculate a difference between a feed-forward output of Text _ Page and a feed-forward output of Ad _ Page, and a specific calculation formula is as follows:
Figure BDA0003812916980000033
in the above formula, F Text Feed-forward output representing Text _ Page, F Ad Represents the feed forward output of Ad _ Page.
Furthermore, the iterative update layer performs iterative update on the Text _ Page by using a gradient descent algorithm to obtain the image Pic, wherein the parameters L, M, N and b in the image data coding layer 1 、b 2 、W 1 And W 2 All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:
Figure BDA0003812916980000041
in the above formula, X Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:
Figure BDA0003812916980000042
preferably, the data of the fusion output layer is composed of two parts including an image Pic and an X obtained by subjecting a Text _ Page to a data preprocessing layer Text The calculation steps of the fusion output layer are as follows:
C=μ*Pic+ξ*X Text
the unfolding is as follows:
Figure BDA0003812916980000043
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and it is the final result that transcribes it into an image.
By adopting the method, the invention has the following advantages:
(1) The magazine advertisement and the magazine text are fused, so that the utilization rate of paper is effectively improved;
(2) The text advertisement embedding method based on the improved Swin Transformer is an automatic magazine advertisement embedding processing mode, and labor force is saved.
(3) The method uses the improved Swin transform to replace the traditional image processing mode based on the convolutional neural network, can conveniently realize parallel computation and distributed computation, and accelerates the data processing speed;
(4) The text advertisement embedding method based on the improved Swin transform can be conveniently constructed by using Pythrch or Tensorflow;
(5) Due to the parameters L, M, N, b in the image data coding layer 1 、b 2 、W 1 And W 2 All are fixed values, so that the method is different from the traditional Swin transform, does not have a training stage, and can directly carry out iterative updating.
Drawings
FIG. 1 is a flowchart illustrating a method for embedding text advertisements based on an improved Swin transform according to the present invention;
FIG. 2 is a flow chart of the computation of the data pre-processing layer proposed by the present invention;
FIG. 3 is a flowchart of a computation of an image data encoding layer;
fig. 4 is a schematic diagram of a calculation method of the attention loss calculation layer.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present solution.
Examples
With reference to fig. 1 to 4, the present embodiment provides a text advertisement embedding method based on an improved Swin Transformer, including a journal text page to be fused, a journal advertisement page to be fused, a data preprocessing layer, an image data encoding layer, an attention loss calculation layer, an iterative update layer, and a fusion output layer: the image data encoding layer includes a multi-headed self-attention layer and a feed-forward network layer.
The method comprises the steps that a to-be-fused magazine Text Page is a content Page of a paper or journal article except an advertisement in a magazine, the file format is a doc file during initial typesetting, subsequent processing is carried out, the format needs to be converted into a JPG format or a PNG format, and the dimension reshape of the to-be-fused magazine Text Page is operated to obtain a dimension of 208 x 288 x 3 and is marked as Text _ Page.
The to-be-fused magazine advertisement Page is an advertisement Page in a magazine, and needs to be converted into a JPG format or a PNG format during subsequent processing, and the to-be-fused magazine advertisement Page is subjected to reshape operation to obtain a size of 208 × 288 × 3, which is recorded as Ad _ Page.
The data preprocessing layer preprocesses Text _ Page and Ad _ Page data, including blocking operation, flattening operation and merging operation:
(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are both 208 multiplied by 288 multiplied by 3, small square regions with the number of 234 are obtained after blocking, and the sizes of the small square regions are as follows:
16×16×3
the number N of the small square areas is as follows:
Figure BDA0003812916980000051
(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 multiplied by 768;
(3) Merging operation, namely merging the vectors flattened by the 234 small squares to obtain a matrix X, wherein the dimensionality of the matrix X is 234 multiplied by 768, and the form is as follows:
X=[x 1 ,x 2 ,…,x N ] T
the result of the Text _ Page passing through the data preprocessing layer is recorded as X Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X Ad
The self-attention layer is used for comparing the correlation between each small square area and obtaining abstract semantic features, can solve the problem of insufficient computing power caused by overlarge information quantity, and comprises the following specific computing steps:
s1, generating feature matrixes L, M and N with the value range of each component between-1 and 1, and setting the feature matrixes L, M and N as unchangeable, wherein the feature matrixes L, M and N are in the following forms:
L=[l 1 ,l 2 ,…,l N ] T
M=[m 1 ,m 2 ,…,m N ] T
N=[n 1 ,n 2 ,…,n N ] T
the dimensionality of each component of the feature matrixes L, M and N is 768 multiplied by 1;
s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:
IN=X×L T
K=X×M T
V=X×N T
wherein the content of the first and second substances,
IN=[in 1 ,in 2 ,…,in N ] T
K=[k 1 ,k 2 ,…,k N ] T
V=[v 1 ,v 2 ,…,v N ] T
s3, calculating attention distribution, wherein a specific calculation formula is as follows:
Figure BDA0003812916980000061
weighted averaging of the input information according to the attention distribution:
Figure BDA0003812916980000062
in the above formula, att i Dimension of is 234 × 1.
The feedforward network layer comprises 234 BP neural networks, the BP neural networks comprise a feedforward input layer, middle hidden layers and a feedforward output layer, the feedforward input layer comprises 234 neurons, the middle hidden layers comprise 48 neurons, and the feedforward output layer comprises 16 neurons; the inputs of the feed-forward input layer are Att respectively 1 、Att 2 、...、Att N-1 And Att N Separately adding Att 1 、Att 2 、...、Att N-1 And Att N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F 1 、F 2 、...、F N-1 And F N The specific calculation steps are as follows:
F i =max(W 1 Att i +b 1 )W 2 +b 2 i∈(1,2,…,234)
in the above formula, b 1 Representing the bias of the hidden layer in the middle, b 2 Representing the bias of the feed-forward output layer, W 1 Inner star weight vector, W, of the middle hidden layer 2 Is the inner star weight vector of the feedforward output layer, wherein b 1 、b 2 、W 1 And W 2 Set as untrained, F i For the output of each BP neural network, specifically F 1 、F 2 、...、F N-1 And F N The dimensions are 16 × 1.
The attention loss calculation layer is used for calculating the difference between the feed-forward output of the Text _ Page and the feed-forward output of the Ad _ Page, and the specific calculation formula is as follows:
Figure BDA0003812916980000071
in the above formula, F Text Feed-forward output representing Text _ Page, F Ad Represents the feed forward output of Ad _ Page.
The iterative update layer carries out iterative update on the Text _ Page by using a gradient descent algorithm to obtain an image Pic, wherein the parameters L, M, N and b in the image data coding layer 1 、b 2 、W 1 And W 2 All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:
Figure BDA0003812916980000072
in the above formula, X Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:
Figure BDA0003812916980000073
the data of the fusion output layer consists of two parts, including an image Pic and an X obtained by processing a Text _ Page through a data pre-processing layer Text The calculation steps of the fusion output layer are as follows:
C=μ*Pic+ξ*X Text
the expansion is as follows:
Figure BDA0003812916980000074
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image according to the rule of the data preprocessing layer, namely the final result.
The first embodiment is as follows:
s1, converting the Text Page of the magazine and the advertisement Page of the magazine into a JPG format or a PNG format, and compressing the Text Page of the magazine and the advertisement Page of the magazine to the same size of 208 multiplied by 288 multiplied by 3 which are respectively marked as Text _ Page and Ad _ Page.
S2, performing data preprocessing operation on the Text _ Page and the Ad _ Page in a data preprocessing layer:
Figure BDA0003812916980000075
Figure BDA0003812916980000076
wherein the result obtained by the Text _ Page passing through the data preprocessing layer is X Text Ad _ Page passed through the data Pre-processing layer as result X Ad
S4, mixing X Text And X Ad Inputting the data into the self-attention layer to obtain respective abstract semantic features, namely:
Figure BDA0003812916980000081
in the above equation, the dimension of each component is 16 × 1.
S5, calculating X by using attention loss calculation layer Text Feed forward output of (2) and X Ad The specific calculation formula of the difference between the feedforward outputs is as follows:
Figure BDA0003812916980000082
s6, using iteration to update layer pair X Text And carrying out iterative update to obtain an image Pic, wherein a specific update calculation formula is as follows:
Figure BDA0003812916980000083
in the above formula, λ is the learning rate, and the final update result is the image Pic, which is in the form:
Figure BDA0003812916980000084
s7, fusing the data of the output layer and forming the data by two parts including images Pic and X Text The calculation steps of the fusion output layer are as follows:
C=μ*Pic+ξ*X Text
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image according to the rule of the data preprocessing layer, namely the final result.
The specific working process of the invention is described above, and the steps are repeated when the device is used next time.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A text advertisement embedding method based on an improved Swin transform is characterized by comprising the following steps: the magazine fusion system comprises a magazine text page to be fused, a magazine advertisement page to be fused, a data preprocessing layer, an image data coding layer, an attention loss calculation layer, an iteration updating layer and a fusion output layer; the image data coding layer comprises a multi-head self-attention layer and a feedforward network layer; the method comprises the steps that a to-be-fused magazine Text Page is a content Page of a paper or periodical article except an advertisement in a magazine, the file format is a doc file during initial typesetting, subsequent processing is carried out, the format needs to be converted into a JPG format or a PNG format, and the dimension reshape of the to-be-fused magazine Text Page is operated to obtain a dimension H multiplied by W multiplied by C which is marked as Text _ Page; the magazine advertisement Page to be fused is an advertisement Page in a magazine, the magazine advertisement Page needs to be converted into a JPG format or a PNG format during subsequent processing, and the dimension reshape of the magazine advertisement Page to be fused is operated to obtain the dimension H multiplied by W multiplied by C which is marked as Ad _ Page.
2. The method of claim 1, wherein the method comprises the following steps: the data preprocessing layer preprocesses the data of the Text _ Page and the Ad _ Page and comprises the following steps:
(1) And (3) blocking, wherein the sizes of the Text _ Page and the Ad _ Page are H multiplied by W multiplied by C, and small square regions with the number of N are obtained after blocking, wherein the size of the small square regions is as follows:
P×P×C
the number N of the small square areas is as follows:
Figure FDA0003812916970000011
(2) Flattening operation, namely flattening each small square area to obtain a vector x with dimensions of 1 x (P multiplied by C);
(3) Merging operation, namely merging vectors of flattened N small squares to obtain a matrix X, wherein the dimensionality of the matrix X is Nx (P multiplied by C), and the form of the matrix X is as follows:
X=[x 1 ,x 2 ,…,x N ] T
the result of the Text _ Page passing through the data preprocessing layer is marked as X Text The result obtained by the Ad _ Page through the data preprocessing layer is recorded as X Ad
3. The method of claim 2, wherein the method comprises the following steps: the self-attention layer is used for comparing the correlation between each small square area and obtaining abstract semantic features, can solve the problem of insufficient calculation power caused by overlarge information quantity, and comprises the following calculation steps:
s1, generating feature matrixes L, M and N with the value range of each component between-1 and 1, and setting the feature matrixes L, M and N as unchangeable, wherein the feature matrixes L, M and N are in the following forms:
L=[l 1 ,l 2 ,…,l N ] T
M=[m 1 ,m 2 ,…,m N ] T
N=[n 1 ,n 2 ,…,n N ] T
wherein the dimension of each component of the feature matrices L, M and N is (P × C) × 1;
s2, generating a search matrix IN, a key matrix K and a value matrix V through the feature matrices L, M and N, wherein the specific calculation mode is as follows:
IN=X×L T
K=X×M T
V=X×N T
wherein:
IN=[in 1 ,in 2 ,…,in N ] T
K=[k 1 ,k 2 ,…,k N ] T
V=[v 1 ,v 2 ,…,v N ] T
s3, calculating attention distribution, wherein the specific calculation formula is as follows:
Figure FDA0003812916970000021
weighted averaging of the input information according to the attention distribution:
Figure FDA0003812916970000022
in the above formula, att i Dimension of is N × 1.
4. The method of claim 3, wherein the method comprises the following steps: the feedforward network layer comprises N BP neural networks, each BP neural network comprises a feedforward input layer, middle hidden layers and feedforward output layers, each feedforward input layer comprises N neurons, each middle hidden layer comprises P × C neurons, and each feedforward output layer comprises P neurons; the inputs to the feed-forward input layer are Att, respectively 1 、Att 2 、...、Att N-1 And Att N Separately adding Att 1 、Att 2 、...、Att N-1 And Att N The feedforward output calculated by inputting the feedforward output into the respective BP neural network is recorded as F 1 、F 2 、...、F N-1 And F N The specific calculation steps are as follows:
F i =max(W 1 Att i +b 1 )W 2 +b 2 i∈(1,2,…,N)
in the above formula, b 1 Representing the bias of the hidden layer in the middle, b 2 Representing the bias of the feed-forward output layer, W 1 Inner star weight vector, W, of the middle hidden layer 2 Is the inner star weight vector of the feedforward output layer, wherein b 1 、b 2 、W 1 And W 2 Set as untrained, F i The dimension of each BP neural network is P multiplied by 1.
5. The method of claim 4, wherein the method comprises the following steps: the attention loss calculation layer is used for calculating the difference between the feed-forward output of the Text _ Page and the feed-forward output of the Ad _ Page, and the specific calculation formula is as follows:
Figure FDA0003812916970000031
in the above formula, F Text Feed-forward output representing Text _ Page, F Ad Represents the feed forward output of Ad _ Page.
6. The method of claim 5, wherein the method comprises the following steps: the iterative update layer carries out iterative update on the Text _ Page by utilizing a gradient descent algorithm to obtain an image Pic, and parameters L, M, N and b in the image data coding layer 1 、b 2 、W 1 And W 2 All fixed values are fixed values, and only the Text _ Page needs to be updated, and the specific calculation formula is as follows:
Figure FDA0003812916970000032
in the above formula, X Text And expressing a result obtained after the Text _ Page is processed by the data preprocessing layer, wherein lambda is the learning rate, and the final updating result is an image Pic and has the form:
Figure FDA0003812916970000033
in the above formula, dimension of Pic and X Text Are the same.
7. The method of claim 6, wherein the method comprises the following steps: the data of the fusion output layer consists of two parts, including an image Pic and an X obtained by processing a Text _ Page through a data preprocessing layer Text Said fusion output layerThe calculation steps are as follows:
C=μ*Pic+ξ*X Text
in the above formula, μ and ξ are weighting coefficients, C represents a matrix form corresponding to the magazine text in which the advertisement is finally embedded, and the matrix form is transcribed into an image, namely, the final result.
CN202211017879.1A 2022-08-24 2022-08-24 Magazine advertisement embedding method based on improved Swin Transformer Active CN115330898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211017879.1A CN115330898B (en) 2022-08-24 2022-08-24 Magazine advertisement embedding method based on improved Swin Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211017879.1A CN115330898B (en) 2022-08-24 2022-08-24 Magazine advertisement embedding method based on improved Swin Transformer

Publications (2)

Publication Number Publication Date
CN115330898A true CN115330898A (en) 2022-11-11
CN115330898B CN115330898B (en) 2023-06-06

Family

ID=83926419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211017879.1A Active CN115330898B (en) 2022-08-24 2022-08-24 Magazine advertisement embedding method based on improved Swin Transformer

Country Status (1)

Country Link
CN (1) CN115330898B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785409A (en) * 2018-12-29 2019-05-21 武汉大学 A kind of image based on attention mechanism-text data fusion method and system
CN113313201A (en) * 2021-06-21 2021-08-27 南京挥戈智能科技有限公司 Multi-target detection and distance measurement method based on Swin transducer and ZED camera
CN113609965A (en) * 2021-08-03 2021-11-05 同盾科技有限公司 Training method and device of character recognition model, storage medium and electronic equipment
CN113658057A (en) * 2021-07-16 2021-11-16 西安理工大学 Swin transform low-light-level image enhancement method
CN113709455A (en) * 2021-09-27 2021-11-26 北京交通大学 Multilevel image compression method using Transformer
CN114283347A (en) * 2022-03-03 2022-04-05 粤港澳大湾区数字经济研究院(福田) Target detection method, system, intelligent terminal and computer readable storage medium
CN114528912A (en) * 2022-01-10 2022-05-24 山东师范大学 False news detection method and system based on progressive multi-mode converged network
CN114550158A (en) * 2022-02-23 2022-05-27 厦门大学 Scene character recognition method and system
CN114743020A (en) * 2022-04-02 2022-07-12 华南理工大学 Food identification method combining tag semantic embedding and attention fusion
CN114821239A (en) * 2022-05-10 2022-07-29 安徽农业大学 Method for detecting plant diseases and insect pests in foggy environment
CN114841977A (en) * 2022-05-17 2022-08-02 南京信息工程大学 Defect detection method based on Swin Transformer structure combined with SSIM and GMSD
CN114898219A (en) * 2022-07-13 2022-08-12 中国标准化研究院 SVM-based manipulator touch data representation and identification method
CN114912575A (en) * 2022-04-06 2022-08-16 西安交通大学 Medical image segmentation model and method based on Swin transform connection path
CN114912461A (en) * 2022-05-31 2022-08-16 浙江工业大学 Deep learning-based Chinese text classification method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785409A (en) * 2018-12-29 2019-05-21 武汉大学 A kind of image based on attention mechanism-text data fusion method and system
CN113313201A (en) * 2021-06-21 2021-08-27 南京挥戈智能科技有限公司 Multi-target detection and distance measurement method based on Swin transducer and ZED camera
CN113658057A (en) * 2021-07-16 2021-11-16 西安理工大学 Swin transform low-light-level image enhancement method
CN113609965A (en) * 2021-08-03 2021-11-05 同盾科技有限公司 Training method and device of character recognition model, storage medium and electronic equipment
CN113709455A (en) * 2021-09-27 2021-11-26 北京交通大学 Multilevel image compression method using Transformer
CN114528912A (en) * 2022-01-10 2022-05-24 山东师范大学 False news detection method and system based on progressive multi-mode converged network
CN114550158A (en) * 2022-02-23 2022-05-27 厦门大学 Scene character recognition method and system
CN114283347A (en) * 2022-03-03 2022-04-05 粤港澳大湾区数字经济研究院(福田) Target detection method, system, intelligent terminal and computer readable storage medium
CN114743020A (en) * 2022-04-02 2022-07-12 华南理工大学 Food identification method combining tag semantic embedding and attention fusion
CN114912575A (en) * 2022-04-06 2022-08-16 西安交通大学 Medical image segmentation model and method based on Swin transform connection path
CN114821239A (en) * 2022-05-10 2022-07-29 安徽农业大学 Method for detecting plant diseases and insect pests in foggy environment
CN114841977A (en) * 2022-05-17 2022-08-02 南京信息工程大学 Defect detection method based on Swin Transformer structure combined with SSIM and GMSD
CN114912461A (en) * 2022-05-31 2022-08-16 浙江工业大学 Deep learning-based Chinese text classification method
CN114898219A (en) * 2022-07-13 2022-08-12 中国标准化研究院 SVM-based manipulator touch data representation and identification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋琪 等: "基于Transformer的汉字到盲文端到端自动转换" *

Also Published As

Publication number Publication date
CN115330898B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN112232149B (en) Document multimode information and relation extraction method and system
Chen et al. Sdae: Self-distillated masked autoencoder
CN106569998A (en) Text named entity recognition method based on Bi-LSTM, CNN and CRF
Sprechmann et al. Supervised sparse analysis and synthesis operators
Fei et al. Low rank representation with adaptive distance penalty for semi-supervised subspace classification
CN110263174B (en) Topic category analysis method based on focus attention
CN110210027B (en) Fine-grained emotion analysis method, device, equipment and medium based on ensemble learning
CN112818764A (en) Low-resolution image facial expression recognition method based on feature reconstruction model
Muppalaneni Handwritten Telugu compound character prediction using convolutional neural network
CN113313173A (en) Human body analysis method based on graph representation and improved Transformer
Nguyen et al. Discriminative low-rank dictionary learning for face recognition
CN114741507B (en) Introduction network classification model establishment and classification of graph rolling network based on Transformer
Dwivedi et al. A Novel deep learning model for accurate prediction of image captions in fashion industry
CN107067373A (en) A kind of gradient minimisation recovery method of binary image based on 0 norm
Khayyat et al. A deep learning based prediction of arabic manuscripts handwriting style.
CN115035531A (en) Retail terminal character recognition method and system
Biradar et al. Classification of book genres using book cover and title
CN111339734B (en) Method for generating image based on text
CN110222222B (en) Multi-modal retrieval method based on deep topic self-coding model
US20230260176A1 (en) System and method for face swapping with single/multiple source images using attention mechanism
CN115330898A (en) Improved Swin transform-based magazine, book and periodical advertisement embedding method
CN113806747B (en) Trojan horse picture detection method and system and computer readable storage medium
Zhang et al. Analytic separable dictionary learning based on oblique manifold
CN115797642A (en) Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field
Wu et al. Transformer Autoencoder for K-means Efficient clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240116

Address after: Room 0606, 6th Floor, Building A, Berlin International Business Center, No. 85 Binhe West Road, Wanbailin District, Taiyuan City, Shanxi Province, 030024

Patentee after: Forest Fantasy (Taiyuan) Digital Technology Co.,Ltd.

Address before: 048000 Room 302, unit 2, building 5, Agricultural Bank of China residential area, Nancheng District, Xinshi East Street, Jincheng Development Zone, Shanxi Province

Patentee before: Jincheng Darui Jinma Engineering Design Consulting Co.,Ltd.