CN107180412A - Transverse and longitudinal based on floor projection and seed point constraint K mean cluster shreds scraps of paper method for reconstructing - Google Patents

Transverse and longitudinal based on floor projection and seed point constraint K mean cluster shreds scraps of paper method for reconstructing Download PDF

Info

Publication number
CN107180412A
CN107180412A CN201710450717.XA CN201710450717A CN107180412A CN 107180412 A CN107180412 A CN 107180412A CN 201710450717 A CN201710450717 A CN 201710450717A CN 107180412 A CN107180412 A CN 107180412A
Authority
CN
China
Prior art keywords
mrow
row
fragment
paper
scrap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710450717.XA
Other languages
Chinese (zh)
Other versions
CN107180412B (en
Inventor
刘有军
陈军华
王文馨
齐兴明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710450717.XA priority Critical patent/CN107180412B/en
Publication of CN107180412A publication Critical patent/CN107180412A/en
Application granted granted Critical
Publication of CN107180412B publication Critical patent/CN107180412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses it is a kind of based on floor projection and seed point constraint K mean cluster transverse and longitudinal shred scraps of paper method for reconstructing, the image of file fragmentation is projected to horizontal direction, with projection come one-dimensional signal carries out cluster in lines.The initial seed point in K mean cluster is constrained as seed point using the one-dimensional signal that the first fragment of every a line is transformed, branch clusters using K mean algorithms.The distance between fragment in often row is calculated apart from calculation formula using introducing penalty coefficient, and sets up the adjacency matrix of distance between fragment, so that Bonding Problem in the row of fragment is converted into traveling salesman problem.Using ant colony optimization for solving by going the traveling salesman problem that interior Bonding Problem is transformed, introduce and merge and divide-and-conquer strategy improves splicing precision in row.Finally the splicing of row fragment is realized using the characteristic vector matching of row fragment.

Description

The transverse and longitudinal chopping scraps of paper based on floor projection and seed point constraint K mean cluster are rebuild Method
Technical field
The invention belongs to image processing field, refer to a kind of text information that scraps of paper image is shredded based on transverse and longitudinal and restore Joining method, is the application of floor projection, seed point limited K mean cluster and ant colony method in a scrap of paper splicing.
Technical background
The inspection of document is a medicolegal important subdomains, and it is with criminal, military, civil, government's law enforcement, the administration of justice etc. leads There is great contact in domain.The key of the inspection of document is to apocrypha using a series of known standards and the method for applied science Test contrast, such as signature verification, handwriting verification.In order to obtain reliable result, legal medical expert worker must be by guarantor Deposit complete file.
Many times apocrypha can may be torn off corner, damage by worms, soak by different degrees of breaking-up, file, with And tear up.In a kind of last situation, file may be torn to pieces by human hand or machine, no matter which kind of situation, legal medical expert workers It is required for reparation file to complete following appraisal.Powder is repaired by the size and number of fragment in a manual manner Broken file needs the substantial amounts of time, and several days or even several all times may be needed by repairing several papers.And file reparation is spelled It is a very boring dull job to connect, therefore one efficient autofile restorative procedure of design seems unusual necessity.
Shredder is used as the instrument for protecting user's paper document information security, it is desirable to which shredding documents information can not be reconditioned. File is ground into the small bulk of transverse and longitudinal by most shredders, therefore research transverse and longitudinal chopping scraps of paper restorative procedure can instruct broken simultaneously The upgrade design of paper machine, preferably protects the information security of user.
The content of the invention
The present invention is directed to propose a kind of reconstruction transverse and longitudinal chopping scraps of paper joining method, this method relatively existing method both at home and abroad Possess higher precision, more automate.This method is greatly improved operating efficiency compared to artificial splicing.
Particular technique content is as follows:
1.1. floor projection is carried out to each fragment, each fragment is converted into an one-dimensional signal;
1.2. according to the one-dimensional signal of fragment, the K mean cluster algorithm being limited using seed point realizes that the branch of fragment gathers Class;
1.3. each row is ranked up, Bonding Problem in row of tiles is converted into traveling salesman problem, using ant colony method Solve this traveling salesman problem;
1.4. the characteristic vector FC=[a of row fragment are used1, a2, a3, a4] carry out the splicing that row fragment is realized in matching.
2. described in step 1.1 include:
2.1. the image of each a scrap of paper is carried out binary conversion treatment, the two-value when the gray value of pixel is more than threshold value 1 is turned to, two-value turns to 0 when gray value is less than threshold value;
2.2. it is the number of the pixel of black during from left to right accumulated image is per a line, the mathematic(al) representation of the process is such as Shown in lower:
Wherein f (y) represents the number of image pixel of black in y rows;I (x, y) represents ashes of the image I at point (x, y) place Angle value, when the point is black, the gray value of the point is 0, and when the point is white, the gray value of the point is 1;N represents fragmentation pattern The horizontal resolution of picture;
2.3. step 2.2. is passed through, fragmentation pattern is converted into an one-dimensional vector d for having m to tie up as ii, wherein m represents fragmentation pattern The vertical resolution of picture.
3. described in step 1.2 include:
3.1. according to the left part of first fragment in every row fragment be this white feature determine first fragment per a line and Its one-dimensional signal CVF1...CVFt, it is that this white feature is determined per a line according to the right part of last fragment in every row fragment Last fragment and its one-dimensional signal CVL1...CVLt, wherein t represents the pulverized line number of fragment;
3.2. the corresponding one-dimensional signal CVF of the first fragment of every row1...CVFtAs in cluster initial in clustering algorithm Heart vector C1...Ct
3.3. the one-dimensional vector di to all kinds of center C of each fragment is calculated using Euclidean distance calculation formula1...Ct's Distance, the mathematical expression form of the step is as follows:
disted(di, Cj)=| | di-Cj||2
Wherein disted(di, Cj) represent fragment i one-dimensional vector diWith cluster centre CjThe distance between;
3.4. fragment i is clustered in from the corresponding classifications of cluster centre j minimum with a distance from it;
3.5. the cluster result in 3.4 steps recalculates cluster centre, and calculation formula is as follows:
Wherein | Ci| represent to be clustered into the number of i fragment, C 'iCluster centre C after expression is updatedi, d represents to be clustered To cluster centre CiThe one-dimensional vector of the fragment of corresponding class;
3.6. iterative step 3.3.-3.5. until in each class fragment no longer change;
3.7. cluster result is returned.
4. described in step 1.3 include:
4.1. each class in cluster branch result is all individually handled, and a scrap of paper of every row is abstracted into the top in graph theory Matching degree between point, chip edges matrix is abstracted into the distance between summit;
4.2. summit i is calculated to summit j (apart from d using formulam(i, j), i.e. a scrap of paper i right hand edge and a scrap of paper j Left hand edge;Calculate the distance between any summit and obtain a complete weighted graph;
Wherein p is a penalty coefficient, and penalty coefficient is defined as follows:
Wherein xiRepresent the vector corresponding to the right hand edge of a scrap of paper i images;xjRepresent that the left hand edge of a scrap of paper j images is corresponding Vector;xjuRepresent vector xiCorresponding gray value, x at the u of positionjuWith xiuIt is similar;T is a constant, wherein can pass through examination Determination is tested, is found through overtesting, is that 0 (i.e. the pixel is when M is arranged in the row fragment longitudinal edge pixel gray value It is black) number average value when splicing precision it is higher.
4.3. ant colony method result of calculation is used, the parameter of ant colony method is set to α=1, β=5, ρ=0.5, (wherein α generations Table heuristic factor, β is represented from the inspiration amount factor, ρ representative informations element residual coefficients) the ant quantity in ant colony method be set to it is broken The quantity of the scraps of paper.
4.4. the result for fragment of going on a journey is rebuild according to ant colony result of calculation.
5. described in step 1.4 include:
If being 5.1. white at the top of row a scrap of paper, incomplete literal line, order are not present in row fragment top Row fragment i characteristic vector FCiOne-component a1=0;
If 5.2. the top the first row word of row a scrap of paper is complete, a is made1=0, the i.e. coboundary of the style of writing word It is overlapping with the coboundary of row a scrap of paper;Otherwise a is made1=l1, wherein l1It is the bottom of incomplete literal line at the top of row fragment broken The position of the scraps of paper;
If 5.3. the bottom of row a scrap of paper is white, incomplete literal line, order is not present in the bottom of the row fragment a4=0;
If 5.4. bottom last column word of row a scrap of paper is complete, a is made4=0, (i.e. the style of writing word is upper Border is overlapping with the lower boundary of row a scrap of paper);Otherwise a is made4=l4, wherein l4It is the top of the imperfect literal line in row fragment bottom Position in a scrap of paper;
If 5.5. making a comprising the word that any a line is complete in row a scrap of paper2=l2, a3=l3, wherein l2It is most to lean on Position of the coboundary of the complete word of that row of nearly row a scrap of paper bottom in a scrap of paper, l3It is closest to a scrap of paper bottom Position of the lower boundary of the complete word of that row in a scrap of paper, otherwise makes a2=0, a3=0 and redirect execution step 5.7.;Together Seasonal l=l2-l3, l '=a4-a3, wherein l is that the word of a line word is high, and l ' is the height in space between literal line, if some There is no any complete row in row fragment, lead to not calculate l, l ', now with other row fragments for coming from same file Average value replace;
If 5.6. a2< L-l-l ', wherein L are the height of a scrap of paper, a2, A2It is modified to a2=a2+ l+l ', a3=a3+ l+l′;If revised a3With uncorrected a4Meet condition a3+l′≤L∪a4=0, then a4It is modified to a4=a3+l′;Such as Really revised a2With uncorrected a1Meet condition (0≤a2-N(l′+l)≤l)∪a1When=0, a1It is modified to a1=a2-R (l '+l), wherein R are a constant and meet R ∈ { 1,2,3 };Terminate characteristic vector pickup;
If 5.7. a1=0, a4=0, it means that there is no text information in a scrap of paper, the fragment from reconstructing restored Removed in problem;Otherwise a is utilized1, a4, l, l ' is to a2, a3It is modified and terminates characteristic vector FCi=[a1, a2, a2, a4] carry Take.
5.8. the characteristics of not having word according to a scrap of paper the first row top blank determines the first row and its spy of a scrap of paper It is FC to levy vector1, the characteristic vector of the current row to be spliced of order is FCcInitial value is set as FCc=FC1
If the 5.9. characteristic vector FC of certain row a scrap of paperiComponent and current characteristic vector FC to be splicedcMeet conditionOr(Represent the characteristic vector FC of current row to be splicedc's Component a4,Represent the characteristic vector FC of certain row a scrap of paperiComponent a1Represent the characteristic vector FC of current row to be splicedc Component a3,Represent the characteristic vector FC of certain row a scrap of paperiComponent a2) when think row a scrap of paper i and current paper to be spliced Piece is adjacent, completes the splicing to row a scrap of paper i, makes FCc=FCi, wherein δ is that random error tolerance can be set as 3.
Brief description of the drawings
Fig. 1:Floor projection process schematic;
Fig. 2:Test the restoration result of sample 1;
Fig. 3:Test the restoration result of sample 6;
Embodiment
The inventive method has used 10 groups of data tests (to be 10 pounds of Time New Roman comprising 5 parts of fonts, line-spacing is The English a scrap of paper file of fixed 13 pounds, 5 parts of fonts are 10 pounds of Time New Roman, line-spacing for fixed 13 pounds and Carry out the English a scrap of paper file of overstriking) precision and automaticity of method, 11 × 17 are included in each group of test data A scrap of paper, i.e. file are crosscut 11, rip cutting 17, and the corresponding specification of picture of every fragment is 180 × 72 pixels.
According to step 1.1, binary-state threshold τ=200 in setting steps 2.1 carry out floor projection to each fragment, Each fragment is changed into the one-dimensional vector that correspondence dimension is 180, represented in the calculation with this one-dimensional vector corresponding broken Picture.
According to step 3.1, first fragment and corresponding one-dimensional vector d in every row fragment are found out1, d2... dmMake The initial cluster center C for the K mean cluster algorithm being limited for seed point1...Cm, and complete cluster according to step 3.2-3.6.
In order to verify the precision of this method, this method is carried with basic K mean cluster algorithm and 2014 in a paper The feature vector clusters algorithm gone out is compared, and result of the comparison see the table below:
Wherein FN, FP are illustrated respectively in assorting process receives the number with False Rejects by mistake, and the calculating of clustering precision is public Formula is as follows:
Wherein TP, TN represent it is correct receive and correct rejection number, by it was found that clustering method precision in this method Apparently higher than other two methods.
In order to verify the splicing precision of this method, in this method cluster process cluster mistake carry out manual correction (by It is seldom in mistake, and vicious line number is seldom, therefore it is very limited to correct these wrong time spents).
According to step 1.3, the M in setting steps 4.2 is that longitudinal edge pixel gray value is 0 (i.e. should in the row fragment Pixel is black) number average value, the parameter of ant group algorithm is α=1, β=0.5, ρ=0.5 in step 4.3, ant Quantity is set to the solution of this traveling salesman problem of programming realization after the completion of 17, parameter setting.
According to step 1.4, realize the splicing of row fragment and calculate splicing precision, accuracy computation formula is as follows:
Result of calculation see the table below:
Precision to the test sample (test sample 1-5 font not overstrikings, test sample 6-10 overstrikings) of different fonts is carried out solely Vertical sample t-test, as a result finds that font has a significant impact (93.4% ± 2% vs 88.6% ± 3.4%, p for splicing precision =0.0038), the splicing result of test sample 1 and test sample 6 is shown in accompanying drawing.
In view of this method very high clustering precision, attempts to solve the transverse and longitudinal chopping of two files in the test process of this method Scraps of paper Problems of Reconstruction (i.e. fragment is from two different files).We use tricks at arbitrarily two files of selection in 10 files Calculation machine, which is ground into 11*17 specification, i.e. problem, 374 fragments, the corresponding picture specification of fragment and as above.It is such Experiment has carried out 10 groups altogether, and cluster result is as follows:
From result, this method can reach in the transverse and longitudinal chopping scraps of paper Problems of Reconstruction for solving two files in some cases Precision shredding scraps of paper Problems of Reconstruction with conventional transverse and longitudinal, for example, will test sample 1 and test sample 6, and test sample 2 When being combined with test sample 9.

Claims (5)

1. the transverse and longitudinal based on floor projection and seed point constraint K mean cluster shreds scraps of paper method for reconstructing, it is characterised in that including Following steps:
1.1. floor projection is carried out to each fragment, each fragment is converted into an one-dimensional signal;
1.2. according to the one-dimensional signal of fragment, the K mean cluster algorithm being limited using seed point realizes that the branch of fragment clusters;
1.3. each row is ranked up, Bonding Problem in row of tiles is converted into traveling salesman problem, solved using ant colony method This traveling salesman problem;
1.4. the characteristic vector FC- [a of row fragment are used1, a2, a3, a4] carry out the splicing that row fragment is realized in matching.
2. according to the method described in claim 1, it is characterised in that:Described step 1.1 includes:
2.1. the image of each a scrap of paper is carried out binary conversion treatment, two-value is turned to when the gray value of pixel is more than threshold value 1, two-value turns to 0 when gray value is less than threshold value;
2.2. it is the number of the pixel of black, the following institute of mathematic(al) representation of the process during from left to right accumulated image is per a line Show:
<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>N</mi> <mo>-</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>x</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>l</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>,</mo> <mi>y</mi> </mrow> <mo>)</mo> </mrow> </mrow>
Wherein f (y) represents the number of image pixel of black in y rows;I (x, y) represents ashes of the image I at point (x, y) place Angle value, when the point is black, the gray value of the point is 0, and when the point is white, the gray value of the point is 1;N represents fragmentation pattern The horizontal resolution of picture;
2.3. step 2.2. is passed through, fragmentation pattern is converted into an one-dimensional vector d for having m to tie up as ii, wherein m represent fragmentation pattern as Vertical resolution.
3. according to the method described in claim 1, it is characterised in that:Described step 1.2 includes:
3.1. it is the first fragment and one that this white feature determines every a line according to the left part of first fragment in often row fragment Dimensional signal CVF1...CVFt, it is that this white feature is determined per a line most according to the right part of last fragment in every row fragment Latter fragment and its one-dimensional signal CVL1...CVLt, wherein t represents the pulverized line number of fragment;
3.2. the corresponding one-dimensional signal CVF of the first fragment of every row1...CVFtAs cluster centre initial in clustering algorithm to Measure C1...Ct
3.3. the one-dimensional vector d of each fragment is calculated using Euclidean distance calculation formulaiTo all kinds of center C1...CtDistance, The mathematical expression form of the step is as follows:
disted(di, Cj)=| | di-Cj||2
Wherein disted(di, Cj) represent fragment i one-dimensional vector diWith cluster centre CjThe distance between;
3.4. fragment i is clustered in from the corresponding classifications of cluster centre j minimum with a distance from it;
3.5. the cluster result in 3.4 steps recalculates cluster centre, and calculation formula is as follows:
<mrow> <msubsup> <mi>C</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>|</mo> </mrow> </mfrac> <msub> <mi>&amp;Sigma;</mi> <mrow> <mi>d</mi> <mo>&amp;Element;</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> </mrow> </msub> <mi>d</mi> </mrow>
Wherein | Ci| represent to be clustered into the number of i fragment, C 'iCluster centre C after expression is updatedi, d represents to be clustered To cluster centre CiThe one-dimensional vector of the fragment of corresponding class;
3.6. iterative step 3.3.-3.5. until in each class fragment no longer change;
3.7. cluster result is returned.
4. according to the method described in claim 1, it is characterised in that:Described step 1.3 includes:
4.1. each class in cluster branch result is all individually handled, and a scrap of paper of every row is abstracted into the summit in graph theory, broken Matching degree between piece matrix of edge is abstracted into the distance between summit;
4.2. summit i is calculated to summit j apart from d using formulam(i, j), i.e. a scrap of paper i right hand edge and a scrap of paper the j left side Edge;Calculate the distance between any summit and obtain a complete weighted graph;
<mrow> <msub> <mi>d</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mrow> <mn>1</mn> <mo>+</mo> <mi>p</mi> </mrow> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mrow> <mn>1</mn> <mo>+</mo> <mi>p</mi> </mrow> <mo>)</mo> </mrow> <msqrt> <mrow> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mo>|</mo> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mi>u</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mi>j</mi> <mi>u</mi> </mrow> </msub> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> </msqrt> </mrow>
Wherein p is a penalty coefficient, and penalty coefficient is defined as follows:
<mrow> <mi>p</mi> <mo>=</mo> <mfrac> <mi>T</mi> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </msubsup> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mi>u</mi> </mrow> </msub> <mo>+</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </msubsup> <msub> <mi>x</mi> <mrow> <mi>j</mi> <mi>u</mi> </mrow> </msub> </mrow> </mfrac> </mrow>
Wherein xiRepresent the vector corresponding to the right hand edge of a scrap of paper i images;xjRepresent that the left hand edge of a scrap of paper j images is corresponding Vector;xjuRepresent vector xiCorresponding gray value, x at the u of positionjuWith xiuIt is similar;T is a constant, wherein true by experiment It is fixed, found through overtesting, be 0 (i.e. the pixel is black) when M is arranged to longitudinal edge pixel gray value in the row fragment Number average value when splicing precision it is higher;
4.3. ant colony method result of calculation is used, the parameter of ant colony method is set to α=1, β=5, ρ=0.5, (wherein α is represented and opened The factor is sent out, β is represented from the inspiration amount factor, ρ representative informations element residual coefficients) the ant quantity in ant colony method is set to a scrap of paper Quantity;
4.4. the result for fragment of going on a journey is rebuild according to ant colony result of calculation.
5. according to the method described in claim 1, it is characterised in that:Described step 1.4 includes:
If being 5.1. white at the top of row a scrap of paper, incomplete literal line is not present in row fragment top, order row is broken Piece i characteristic vector FCiOne-component a1=0;
If 5.2. the top the first row word of row a scrap of paper is complete, a is made1=0, the i.e. coboundary of the style of writing word and row The coboundary of a scrap of paper is overlapping;Otherwise a is made1=l1, wherein l1It is the bottom of incomplete literal line at the top of row fragment in a scrap of paper Position;
If 5.3. the bottom of row a scrap of paper is white, incomplete literal line is not present in the bottom of the row fragment, makes a4=0;
If 5.4. bottom last column word of row a scrap of paper is complete, a is made4=0, the i.e. coboundary of the style of writing word with The lower boundary of row a scrap of paper is overlapping;Otherwise a is made4=l4, wherein l4It is the top of the imperfect literal line in row fragment bottom in a scrap of paper In position;
If 5.5. making a comprising the word that any a line is complete in row a scrap of paper2=l2, a3=l3, wherein l2It is closest to row Position of the coboundary of the complete word of that row of a scrap of paper bottom in a scrap of paper, l3It is closest to that row of a scrap of paper bottom Position of the lower boundary of complete word in a scrap of paper, otherwise makes a2=0, a3=0 and redirect execution step 5.7.;With season l =l2-l3, l '=a4-a3, wherein l is that the word of a line word is high, and l ' is the height in space between literal line, if some row is broken There is no any complete row in piece, lead to not calculate l, l ', now putting down with other row fragments for coming from same file Average is replaced;
If 5.6. a3< L-l-l ', wherein L are the height of a scrap of paper, a2, a3It is modified to a2=a2+ l+l ', a3=a3+l+l′; If revised a3With uncorrected a4Meet condition a3+l′≤L∪a4=0, then a4It is modified to a4=a3+l′;If repaiied A after just2With uncorrected a1Meet condition (0≤a2-N(l′+l)≤l)∪a1When=0, a1It is modified to a1=a2-R(l′+ L), wherein R is a constant and meets R ∈ { 1,2,3 };Terminate characteristic vector pickup;
If 5.7. a1=0, a4=0, it means that there is no text information in a scrap of paper, the fragment from reconstructing restored problem It is middle to remove;Otherwise a is utilized1, a4, l, l ' is to a2, a3It is modified and terminates characteristic vector FC1=[a1, a2, a3, a4] extract;
5.8. the characteristics of word not being had according to a scrap of paper the first row top blank determine the first row and its feature of a scrap of paper to Measure as FC1, the characteristic vector of the current row to be spliced of order is FCCInitial value is set as FCC=FC1
If the 5.9. characteristic vector FC of certain row a scrap of paperiComponent and current characteristic vector FC to be splicedCMeet conditionOrWhen think that row a scrap of paper i is adjacent with the current scraps of paper to be spliced, whereinTable Show the characteristic vector FC of current row to be splicedCComponent a4,Represent the characteristic vector FC of certain row a scrap of paperiComponent a1Table Show the characteristic vector FC of current row to be splicedCComponent a3,Represent the characteristic vector FC of certain row a scrap of paperiComponent a2;Complete Splicing to row a scrap of paper i, makes FCC=FCi, wherein δ is that random error tolerance is set as 3.
CN201710450717.XA 2017-06-15 2017-06-15 Horizontal and vertical shredded paper sheet reconstruction method based on horizontal projection and K-means clustering Active CN107180412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710450717.XA CN107180412B (en) 2017-06-15 2017-06-15 Horizontal and vertical shredded paper sheet reconstruction method based on horizontal projection and K-means clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710450717.XA CN107180412B (en) 2017-06-15 2017-06-15 Horizontal and vertical shredded paper sheet reconstruction method based on horizontal projection and K-means clustering

Publications (2)

Publication Number Publication Date
CN107180412A true CN107180412A (en) 2017-09-19
CN107180412B CN107180412B (en) 2020-10-16

Family

ID=59835763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710450717.XA Active CN107180412B (en) 2017-06-15 2017-06-15 Horizontal and vertical shredded paper sheet reconstruction method based on horizontal projection and K-means clustering

Country Status (1)

Country Link
CN (1) CN107180412B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921793A (en) * 2018-07-15 2018-11-30 江西理工大学 A scrap of paper based on fuzzy C-means clustering splices restored method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236050A (en) * 2013-05-06 2013-08-07 电子科技大学 Auxiliary bank note and worn coin reestablishing method based on graph clustering
CN103996180A (en) * 2014-05-05 2014-08-20 河海大学 Paper-shredder broken-document restoration method based on English character characteristics
CN104143095A (en) * 2014-07-16 2014-11-12 暨南大学 Fragment restoring method based on genetic algorithm and character identification technology
CN104182966A (en) * 2014-07-16 2014-12-03 江苏大学 Automatic splicing method of regular shredded paper
CN104537368A (en) * 2015-01-07 2015-04-22 北京工业大学 Recovery and analysis method for English printed double-sided printing broken file
CN105701500A (en) * 2016-01-01 2016-06-22 三峡大学 Single-sided English paper scrap splicing identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236050A (en) * 2013-05-06 2013-08-07 电子科技大学 Auxiliary bank note and worn coin reestablishing method based on graph clustering
CN103996180A (en) * 2014-05-05 2014-08-20 河海大学 Paper-shredder broken-document restoration method based on English character characteristics
CN104143095A (en) * 2014-07-16 2014-11-12 暨南大学 Fragment restoring method based on genetic algorithm and character identification technology
CN104182966A (en) * 2014-07-16 2014-12-03 江苏大学 Automatic splicing method of regular shredded paper
CN104537368A (en) * 2015-01-07 2015-04-22 北京工业大学 Recovery and analysis method for English printed double-sided printing broken file
CN105701500A (en) * 2016-01-01 2016-06-22 三峡大学 Single-sided English paper scrap splicing identification method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XU H, ZHENG J, ZHUANG Z, ETAL.: "a solution to reconstruct cross-cut shredded text documents based on character recognition and genetic algorithm", 《ABSTRACT AND APPLIED ANALYSIS. HINDAWI》 *
刘啸泽; 李璞; 陈香: "碎纸片的拼接复原", 《电子测试》 *
李君阳: "规则的模拟碎纸片拼接算法的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
毛俊诚; 丁根宏; 郭东威: "基于单亲遗传算法的中文碎纸片全自动拼接", 《信息技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921793A (en) * 2018-07-15 2018-11-30 江西理工大学 A scrap of paper based on fuzzy C-means clustering splices restored method

Also Published As

Publication number Publication date
CN107180412B (en) 2020-10-16

Similar Documents

Publication Publication Date Title
US6886136B1 (en) Automatic template and field definition in form processing
Hochberg et al. Script and language identification for handwritten document images
US20080172386A1 (en) Automated dental identification system
US20080006687A1 (en) Electronic image cash letter validation
DE10342594A1 (en) Method and system for collecting data from a plurality of machine readable documents
CN103700081B (en) A kind of shredder crushes the restoration methods of English document
Villa et al. Elephant bones for the Middle Pleistocene toolmaker
CN107180412A (en) Transverse and longitudinal based on floor projection and seed point constraint K mean cluster shreds scraps of paper method for reconstructing
CN104182966B (en) A kind of regular shredded paper method for automatically split-jointing
CN109241310A (en) The data duplicate removal method and system of face image database
JP4853073B2 (en) Form classification processing system, electronic medium receipt creation method and delivery method
CN106952230A (en) Transverse and longitudinal based on cluster and ant group algorithm shreds piece restored method
DE602005006407T2 (en) Method and system for signing physical documents and authenticating signatures on physical documents
CN107103543B (en) Protocol data processing method and system
Sorio et al. Open world classification of printed invoices
CN110188328A (en) Folder structuring treating method and apparatus
CN115731377A (en) Bidder information verification system, method and device based on picture identification
Summers Document image improvment for OCR as a classification problem
CN113807256A (en) Bill data processing method and device, electronic equipment and storage medium
Garris et al. Creating and validating a large image database for METTREC
JP2017157083A (en) File reconstruction device and program
Walker et al. A synthetic document image dataset for developing and evaluating historical document processing methods
CN113286053B (en) File scanning method, equipment, medium and product
EP4033376A1 (en) Distributed computer system for document authentication
US20240054587A1 (en) Systems and methods for electronic document notarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant