CN116187717A

CN116187717A - File warehousing management method and system

Info

Publication number: CN116187717A
Application number: CN202310442882.6A
Authority: CN
Inventors: 夏予柱; 张文杰; 胡入幻
Original assignee: Sichuan Jintou Science And Technology Co ltd
Current assignee: Sichuan Jintou Science And Technology Co ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-05-30
Anticipated expiration: 2043-04-24
Also published as: CN116187717B

Abstract

The invention provides a file warehousing management method and system, which belong to the field of data processing, wherein the method comprises the following steps: collecting archival images; preprocessing the archival image to obtain a preprocessed archival image; performing layout identification on the preprocessed archival image, and determining the type of the archives; based on the type of the file, carrying out target area positioning and character recognition on the preprocessed file image to acquire file information; the method for generating the warehousing labels based on the file information has the advantages of automatically extracting and recording the information of the warehousing files and improving warehousing efficiency.

Description

File warehousing management method and system

Technical Field

The invention relates to the field of data processing, in particular to a file warehousing management method and system.

Background

The bank accounting warehouse manages a large amount of important file information such as seal cards, mortgage articles, notes, bankers and the like, the information quantity is huge, when daily file information is put in storage, traditional manual management is adopted, file images are shot through a high-speed camera, file types and file related key information are manually input, and then the file images are sent to an RFID label printer to print labels; thirdly, the business operation has no system support, the electronic degree is low, the business operation is not in butt joint with the file system, the business flow is not uniform, a plurality of posts can be packaged, and a library manager cannot grasp the packaged objects. Fourth, the efficiency is low, the identification, counting and warehousing of the archival objects are completely manual, and the warehousing efficiency is low.

Therefore, it is necessary to provide a method and a system for managing files for automatically extracting and recording information of files for storage, so as to improve storage efficiency.

Disclosure of Invention

One of the embodiments of the present disclosure provides a method for managing archives, where the method includes: collecting archival images; preprocessing the archival image to obtain a preprocessed archival image; performing layout identification on the preprocessed archive image to determine the type of the archive; based on the file type, carrying out target area positioning and character recognition on the preprocessed file image to obtain file information; and generating a warehouse entry label based on the archive information.

In some embodiments, the method further comprises: before preprocessing the archival image, performing ambiguity analysis on the archival image, and judging whether to acquire the archival image again.

In some embodiments, the performing the ambiguity analysis on the archival image to determine whether to re-acquire the archival image includes: acquiring an area image to be analyzed from the archive image; determining the overall contrast of the region image to be analyzed; determining the ambiguity of the region image to be analyzed based on the average ambiguity when the overall contrast of the region image to be analyzed is smaller than an overall contrast threshold; determining the ambiguity of the region image to be analyzed based on the edge ambiguity distance when the overall contrast of the region image to be analyzed is greater than or equal to an overall contrast threshold; and judging whether to acquire the archival image again or not based on the ambiguity of the regional image to be analyzed.

In some embodiments, the determining the blur degree of the area image to be analyzed based on the edge blur distance includes: determining sharp points of the to-be-analyzed area image; calculating a fuzzy distance corresponding to each sharp point; and determining the ambiguity of the image of the area to be analyzed based on the ambiguity distance corresponding to each sharp point.

In some embodiments, the preprocessing the archival image to obtain a preprocessed archival image includes: and correcting the deviation, removing black edges and white balance of the archival image to obtain the preprocessed archival image.

In some embodiments, the performing layout recognition on the preprocessed archive image to determine the type of the archive includes: traversing all template images to perform template matching on the preprocessed archival images, and judging whether template images matched with the preprocessed archival images exist or not; if a template image matched with the preprocessed archive image exists, determining the type of the archive based on the matched template image; if the template image matched with the preprocessed archival image does not exist, rotating the preprocessed archival image to generate a rotated archival image, traversing all the template images to perform template matching on the preprocessed archival image, and judging whether the template image matched with the rotated archival image exists or not; and if the template image matched with the rotated archive image exists, determining the type of the archive based on the matched template image.

In some embodiments, the traversing all template images performs template matching on the preprocessed archival image, and determining whether there is a template image matching the preprocessed archival image includes: sequencing the template images according to the length and the width of the preprocessed archival images; according to the sequencing result, sequentially positioning frame line datum points of all the template images and the template images; judging whether a template image matched with the preprocessed archive image exists or not according to a frame line datum point positioning result; if the template images matched with the preprocessed archive images are judged to be absent according to the frame line reference point positioning result, according to the sorting result, carrying out template identification flow of the frame line reference point-free on all the template images and the template images in sequence; and judging whether a template image matched with the preprocessed archival image exists or not according to the result of the template identification flow of the reference point without the frame line.

In some embodiments, the generating a binning tag based on the profile information includes: and writing the file information into a warehouse-in label corresponding to the file image through an RFID desktop reader-writer or an RFID printer, and printing the warehouse-in label.

One of the embodiments of the present disclosure provides a file warehouse management system, the system including: the image acquisition module is used for acquiring archival images; the preprocessing module is used for preprocessing the archive image and acquiring a preprocessed archive image; the type determining module is used for carrying out layout identification on the preprocessed archival image and determining the type of the archives; the information identification module is used for carrying out target area positioning and character identification on the preprocessed archive image based on the archive type to obtain archive information; and the label generating module is used for generating a warehouse-in label based on the file information.

In some embodiments, the image acquisition module is further to: and carrying out ambiguity analysis on the archival image, and judging whether to acquire the archival image again.

In some embodiments, the file warehousing management method and system disclosed in the present specification have at least the following beneficial effects compared with the prior art:

1. based on artificial intelligence, data mining and other technologies, the automatic identification, classification and label management of important file information can be realized, so that the file storage is more intelligent, the time for manually arranging the files and occasional errors are reduced, the storage efficiency is improved, and the file can be quickly stored at any time through a system;

2. And generating a warehouse-in label based on the identified file information, thereby providing convenience for a later management system to record and track the state of the label and providing an effective implementation way for file entity management and improvement of safety operation service efficiency.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a flow chart of a method of archive management according to some embodiments of the present disclosure;

FIG. 2 is a schematic illustration of an image of a region to be analyzed in an archival image according to some embodiments of the present disclosure;

FIG. 3 is a flow diagram illustrating ambiguity analysis of archival images according to some embodiments of the present disclosure;

FIG. 4 is a flow diagram of template recognition shown in accordance with some embodiments of the present description;

FIG. 5a is a schematic illustration of an upper left corner locating template according to some embodiments of the present description;

FIG. 5b is a schematic illustration of a lower left corner positioning template according to some embodiments of the present disclosure;

FIG. 5c is a schematic illustration of an upper right corner locating template according to some embodiments of the present disclosure;

FIG. 5d is a schematic illustration of a lower right corner locating template according to some embodiments of the present disclosure;

FIG. 6 is a schematic illustration of a hitching section shown in accordance with some embodiments of the present disclosure;

FIG. 7 is a block diagram of an archive management system according to some embodiments of the present disclosure;

FIG. 8 is a block diagram of an electronic device shown in accordance with some embodiments of the present description;

FIG. 9 is a schematic illustration of an image of an area to be analyzed according to some embodiments of the present disclosure;

fig. 10 is a schematic diagram of weighted regions shown in accordance with some embodiments of the present description.

In the figure, 610, a first matching unit; 620. sleeving and beating areas; 621. an identification unit; 622. an identification unit; 1010. a wire reference point search area; 1020. weighting areas.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Step 110, an archival image is acquired.

In some embodiments, an image acquisition device may be employed to acquire archival images. The image acquisition equipment can be equipment for acquiring images by a video camera, a high-speed camera and the like.

Step 120, preprocessing the archival image to obtain a preprocessed archival image.

In some embodiments, preprocessing may be operations to improve the quality of archival images. For example, the preprocessing may include operations such as denoising.

In some embodiments, preprocessing may include rectifying, blacking and white balancing the archival image.

In some embodiments, rectifying the archival image may specifically include: converting a color file image into a gray image, performing scaling processing on the gray image to speed up the inclination detection, performing scaling operation without influencing the calculation of the inclination, binarizing the scaled gray image, retrieving contours from the binary image based on a cvFindContours function, returning the number of the detected contours, extracting straight lines from the contour extraction graph, obtaining the contour extraction graph, calculating the slope of each straight line, and performing inclination correction on the file image according to the slope.

In some embodiments, extracting the straight line from the contour extraction map may specifically include: setting the contour pixel as 255, setting other points as 0, determining the size of a Hough transformation accumulator according to the image size, allocating memory, performing Hough transformation on the image, storing the transformation result into the Hough transformation accumulator, setting an ambiguity threshold, clearing the points with the accumulation value smaller than the ambiguity threshold in the Hough transformation accumulator according to the ambiguity threshold, namely, considering that the points do not correspond to a straight line in the image domain, finding the point with the largest accumulation value in the Hough transformation accumulator, recording the point and clearing the field, continuing to find and record the point with the largest accumulation value until all the accumulation values in the accumulator are zero, and drawing the straight line in the image domain according to the detected points.

In some embodiments, calculating the slope of each line may specifically include: the slope in the first L longest straight lines in the straight lines is converted into an angle, and the inclination of the longest straight line in the straight lines with the angle smaller than 45 degrees is taken as the inclination of the image. Where L represents the use of the straight line of L before ordering for making the tilt angle. For example only, l=5.

In some embodiments, the tilt correction of the archival image based on the slope may specifically include:

assume that

For rotating the front coordinates, the image tilt angle is clockwise +.>

Degree, counter-clockwise the image is

Correction of the degree, rotation of the compensated coordinates +.>

The method comprises the following steps:

。

similarly, if the image is tilted at a clockwise angle

The degree is clockwise +.>

Correction of the degree, rotation of the corrected coordinates +.>

The method comprises the following steps:

。

in some embodiments, the target of other areas after the tilt correction is automatically supplemented with black, and the redundant black edges are removed to obtain useful image information.

In some embodiments, white balancing operations may be performed on the archival image after the black border is removed. The method specifically comprises the following steps: and counting the occurrence times of each pixel value on the RGB three channels respectively. The maximum and minimum values of R% are set to 255 and 0. The remaining values are mapped to (0, 255) such that the values of each value channel are distributed more evenly in RGB. The result of color balance is achieved. For example only, r=1.

Where R represents the stretching force, the larger R is the area of the bright and dark regions in the image, and the smaller R is the area of the bright and dark regions in the image.

In some embodiments, before preprocessing the archival image, the archival image is subjected to ambiguity analysis to determine whether to acquire the archival image again, if the ambiguity does not meet the ambiguity requirement, the process needs to return to step 110 to acquire the archival image again, and if the ambiguity meets the ambiguity requirement, the archival image can be preprocessed.

FIG. 3 is a schematic flow chart of ambiguity analysis on a archival image according to some embodiments of the present disclosure, as shown in FIG. 3, in some embodiments, the ambiguity analysis is performed on the archival image to determine whether to re-acquire the archival image, which may specifically include:

converting the archive image into a gray scale image;

acquiring an area image to be analyzed from the archive image;

determining the overall contrast of the area image to be analyzed, wherein the clearer the area image to be analyzed is, the greater the overall contrast of the area image to be analyzed is;

when the overall contrast of the area image to be analyzed is smaller than the overall contrast threshold, determining the ambiguity of the area image to be analyzed based on the average ambiguity;

when the overall contrast of the area image to be analyzed is greater than or equal to the overall contrast threshold, determining the ambiguity of the area image to be analyzed based on the edge ambiguity distance;

and judging whether to acquire the archival image again or not based on the ambiguity of the area image to be analyzed.

In some embodiments, the region-to-be-analyzed image may be a region of interest in the archival image. For example, fig. 2 is a schematic diagram of an area image to be analyzed in a archival image according to some embodiments of the present disclosure, and as shown in fig. 2, a left 1/4 of the archival image is taken as a left boundary, a right 1/4 of the archival image is taken as a right boundary, an upper 1/4 of the archival image is taken as an upper boundary, and a lower 1/4 of the archival image is taken as a lower boundary, so as to determine an area a, which is the area image to be analyzed, that is, an area of interest.

By way of example only,

representing the horizontal position in the image of the area to be analyzed as +.>

The vertical position is +.>

The value at the pixel point, i.e. +.>

An abscissa position representing a point in the image of the area to be analyzed,/->

Representing the ordinate position of a point in the image of the area to be analyzed. />

Wherein->

For the width of the region image to be analyzed, +.>

For the height of the region image to be analyzed, +.>

。

In some embodiments, the overall contrast of the region image to be analyzed may be determined based on the following formula:

wherein,,

for the overall contrast of the image of the region to be analyzed, +.>

Representing the gray level difference between adjacent pixels, for example>

The probability of distribution of the respective gradation differences at each pixel point is given.

The calculation steps are as follows:

(1) Calculating 4-domain gray scale differences for each position:

in some embodiments, the neighboring pixels include 4-neighbor pixels, i.e., left, right, up, right, down pixels. The current pixel point

Pixel point on the left side of it +.>

The gray level difference calculating method comprises the following steps:

；

current pixel point

Pixel point on the right side of the pixel point +.>

The gray scale difference between them is:

；

current pixel point

And pixel point right above->

The gray scale difference between them is:

；

current pixel point

And pixel just below->

The gray scale difference between them is: />

。

(2) Calculating a probability value of each gray level difference;

In some embodiments, the probability of the distribution of the gray differences at each pixel point is determined by the total number of gray differences, the probability of the gray differences at each pixel point is equal to 1 divided by the total number of gray differences, and the number of gray differences in the field in the image is regular, for example, in the 4-field, specifically determined by:

when W > H is used, the process is,

；

when w=h, the number of the groups,

；

when W is less than H,

。

(3) Calculating the overall contrast:

for example only, determining the overall contrast flow of the region image to be analyzed as shown in fig. 9 may include:

(1) Calculating 4-domain gray scale differences for each position:

1）[0,0]the pixel value at the pixel point is 2, and only the lower and right 2 field points are arranged in the 4 adjacent pixel points, so that the gray level difference between the pixel point and the adjacent pixel point is respectively

；

2）[1,0]The pixel value at the pixel point is 3, and only the left, right and lower 3 areas in the 4 areas are provided, so that the gray level difference between the pixel point and the adjacent pixel point is respectively

；

3) And by analogy, calculating the gray level difference between the pixel point at each pixel point of each row and the pixel in the field of the pixel point;

(2) Calculating a probability value for each gray level difference:

the gray level differences of the first row are 2+3+3+2, the gray level differences of the second row are 3+4+4+3, the gray level differences of the third row are 3+4+3, and the gray level differences of the fourth row are 2+3+3+2; there are 48 gray differences, and the probability value of each gray difference is

。

(3) Calculating the overall contrast:

in some embodiments, determining the blur level of the area image to be analyzed based on the average blur level may specifically comprise the steps of:

dividing an area image to be analyzed into areas with M x N, wherein M represents that the area image to be analyzed is horizontally divided into M sub-areas, and N represents that the area image to be analyzed is vertically divided into N sub-areas;

solving the contrast of each region;

and (5) calculating ambiguity.

In some embodiments, the contrast of a region may be determined based on the following equation:

wherein,,

for contrast of a certain area->

Is a variable, use->

M th sub-area representing level, +.>

Represents the vertical->

Sub-area, & lt & gt>

Level->

Vertical->

Maximum value of gray values of individual regions, +.>

Level->

Vertical->

Minimum of gray values of the individual regions.

In some embodiments, the ambiguity may be calculated based on the following formula:

wherein,,

is the ambiguity of the image of the region to be analyzed.

In some embodiments, determining the blur degree of the area image to be analyzed based on the edge blur distance may comprise the steps of:

calculating sharp points;

and (5) obtaining an ambiguity evaluation function.

In some embodiments, calculating the sharp point may include the steps of:

Calculating the gradient value of each pixel point of the current image to-be-analyzed area;

an initial threshold value for the gradient is specified,

gradient initial threshold representing sharp point, e.g., +.>

；

Calculating an experience threshold;

a sharp point is determined.

In some embodiments, the sobel operator may be used to calculate the gradient value for each pixel point:

。

in some embodiments, the empirical threshold

The method comprises the following steps:

wherein,,

representation->

Is greater than->

Is>

Indicating that all are greater than->

Gradient of->

A kind of electronic device.

In some embodiments, it will

Less than->

The gradient values of (2) are set to zero, representing non-sharp points, and the other points are sharp points. />

traversing each sharp point in the analysis area image from top to bottom and from left to right;

counting the total number cnt2 of sharp points;

for each sharp point, calculating the fuzzy distance of the sharp point;

and calculating the ambiguity of the whole image.

In some embodiments, the left blur distance is obtained by starting to scan left from the sharp point, and whether the point is mutated or not is judged first, and the conditions that no mutation occurs are 2:

1. the threshold difference between adjacent pixels is smaller than the threshold of abrupt change of gray value, namely:

Wherein,,

a threshold value representing a sudden change in gray value, in some embodiments +_>

。

2. The gray value variation area between adjacent pixels is the same, namely:

wherein t is a variable representing the t-th pixel point scanned leftwards from the current pixel point,

。

when no mutation occurred, the point was recorded as

From this point on, the condition is satisfied

The number of points of (1) is->

And counting the left fuzzy distance by adding 1. Wherein (1)>

Representing from->

]No mutation at the point was initiated,/->

As a variable, represent the s-th pixel point scanned leftwards from the point where no abrupt change occurs,/->

Represents a gray level difference threshold between pixels, th4 represents a continuous gray level differenceNot more than->

Is a point threshold of (2). In some embodiments, ->

。

It will be appreciated that the edge blur distance may be used to calculate the image blur, but because the edge of an image that is too blurred is not obvious, if only the edge blur distance method is used, the calculated blur is smaller when the image is too blurred. If the average ambiguity detection method is adopted, when the image is not too blurred, the ambiguity of the image cannot be accurately distinguished, so when the overall contrast of the image of the area to be analyzed is smaller than the overall contrast threshold, the ambiguity of the image of the area to be analyzed is determined based on the average ambiguity; and determining the ambiguity of the region image to be analyzed based on the edge ambiguity distance when the overall contrast of the region image to be analyzed is greater than or equal to the overall contrast threshold.

Similarly, right blur distance is obtained by starting right scan from the sharp point.

Firstly judging whether the point is mutated or not, wherein the conditions for not generating mutation are 2:

1. the threshold difference between adjacent pixels is smaller than the set threshold, namely:

2. the gray value variation area between adjacent pixels is the same, namely:

wherein p is a variable representing the p-th pixel point scanned rightward from the current pixel point,

。

in some embodiments, the left and right blur distances of the sharp point are added to obtain the blur distance of the sharp point.

In some embodiments, the blur distances of the sharp points in the entire image are added to obtain a blur distance sum

And divided by the number of sharp points +.>

Obtaining the average blur distance +.>

This value is the ambiguity of the entire image.

In some embodiments, when the detected ambiguity blu is greater than the ambiguity threshold

Or the number of sharp points is 0, which indicates that the image is blurred, the archival image needs to be acquired again, or the archival image does not blur, and the archival image does not need to be acquired again. By way of example only, < >>

。

In some embodiments, the blur threshold may be appropriately adjusted according to the performance of the actual image acquisition device. For example, the better the performance of the image acquisition device, the smaller the blur threshold.

And 130, performing layout identification on the preprocessed archive image to determine the type of the archive.

In some embodiments, performing layout recognition on the preprocessed archive image to determine the type of the archive may include:

traversing all template images to perform template matching on the preprocessed archive images, and judging whether template images matched with the preprocessed archive images exist or not;

if a template image matched with the preprocessed archive image exists, determining the type of the archive based on the matched template image;

if the template image matched with the preprocessed file image does not exist, rotating the preprocessed file image to generate a rotated file image, traversing all the template images to perform template matching on the preprocessed file image, and judging whether the template image matched with the rotated file image exists or not;

if a template image matched with the rotated archive image exists, determining the type of the archive based on the matched template image.

For example, the preprocessed archival images are sent to a template recognition flow, all template images are traversed for template recognition, and if the template recognition is successful, the layout type corresponding to the corresponding template image is returned; otherwise, the preprocessed archives images are rotated 180 degrees and then sent to a template recognition process, all template images are traversed to carry out template recognition, if the template recognition is successful, the layout types corresponding to the corresponding template images are returned, otherwise, the preprocessed archives images are sequentially rotated 90 degrees in front and rotated 270 degrees in front, when the template matching is successful, the layout recognition is successful, the layout types corresponding to the corresponding template images are returned, and otherwise, the layout recognition is failed.

In some embodiments, the above processes can automatically correct the file images under the conditions of horizontal placement, vertical placement, etc., so as to improve the efficiency of file processing.

FIG. 4 is a schematic diagram of a template recognition flow, as shown in FIG. 4, according to some embodiments of the present description, in some embodiments, the template recognition flow is as follows:

step 1, sequencing template images according to the length-width matching degree with file images;

step 2, starting from the first template, performing template identification;

step 3, if the absolute value of the length or width error between the archival image and the matched template image exceeds the maximum value of the length or width error between the archival image and the matched template image, indicating that the archival image and the matched template are not matched, and selecting the next template image to repeat the step 3; otherwise, go to step 4. Wherein the maximum value of the length-width error between the file image and the matched template image is

The template recognition method is used for accelerating the speed of template recognition; in some embodiments, ->

；

Step 4, positioning the frame line datum point in the appointed range of the current template, and if the positioning of the frame line datum point fails; selecting the next template to repeat the step 3; otherwise, enter step 5;

Step 5, determining the absolute position of the matching area in the archival image according to the calculated reference point position of the archival image, the reference point position in the current traversal template image and the relative position of the first matching unit 610 in the template;

step 6, performing template matching according to the matching area in the archive image and the matching area in the template image, if the matching value is larger than

Recording the template matching value, and entering a step 7; otherwise, selecting the next template to repeat the step 3; wherein (1)>

Threshold representing template matching, in the present embodiment,/-in>

；

Step 7, judging whether all templates are traversed, if so, comparing all the matching values recorded in the step 6, and recording the template with the largest matching value, wherein the template at the moment is the matched template, and the template identification is successful; if the template is not traversed, selecting the next template to repeat the step 3;

step 8, when all templates are traversed, no recorded matching success value exists, and the template identification failure is indicated;

step 9, when the positioning of the frame line datum points in the appointed range of all templates fails, entering a template identification flow of the frame line datum points;

Step 10, when the positioning of the frame line datum points fails, the absolute coordinates of the first matching unit 610 in the template are expanded by one range to serve as a search area, and the steps 6-8 are repeated.

In some embodiments, the wire reference point in step 4 is determined by the wire in the image, and the wire reference point can be used to locate the identification element that remains the same distance from the wire. Since most of the file images have rectangular square frames, the rectangle is called a positioning rectangle, which is one of the most prominent marks in the whole file image, the top left corner vertex of the positioning rectangle is called an upper left positioning point, the top right corner vertex of the positioning rectangle is called an upper right positioning point, the bottom left corner vertex of the positioning rectangle is called a lower left positioning point, and the bottom right corner vertex of the positioning rectangle is called a lower right positioning point. The observation finds that the relative positions of the elements in the archival image to the anchor points are fixed, so the anchor points in the archival image can be determined first.

In some embodiments, the process of wire fiducial positioning may include the steps of:

determining a positioning template;

and (5) using a positioning algorithm to perform frame line datum point positioning.

In some embodiments, the positioning templates refer to matrices for positioning vertices of a rectangle, and convolving with the vertices, for which purpose an upper left-hand positioning template, a lower left-hand positioning template, an upper right-hand positioning template, and a lower right-hand positioning template are respectively constructed as shown in fig. 5 a-5 d.

In some embodiments, the elements in the upper left corner locating template are

Wherein->

Column value representing positioning template, representing level +.>

Individual pixels +.>

A row value representing the positioning template, a vertical +.>

Individual pixels, thus->

Represent the left upper corner locating template +>

Line->

The values of the columns are formulated as follows:

；/>

wherein the method comprises the steps of

Is a positioning template of->

Is the size of the template.

In some embodiments, use of

Represent the right upper corner locating template +>

Line->

The values of the columns are formulated as follows:

。

in some embodiments, use of

Representing the lower left corner positioning template/>

Line->

The values of the columns are formulated as follows:

。

in some embodiments, use of

Represent the lower right corner locating template +.>

Line->

The values of the columns are formulated as follows:

。

in some embodiments, the wire fiducial positioning is used to find wire fiducial points in a search area specified by an image block, and the wire fiducial point positioning using a positioning algorithm may include the steps of:

converting the collected color image of the front face of the file into a gray image;

determining a weighted area and cutting;

binarizing the cut weighted area image;

determining a selected positioning template;

performing convolution operation on the selected template and the image;

And determining the position of the positioning point of the upper right corner.

In some embodiments, as shown in FIG. 10, point B represents the wire fiducial to be located, and the smaller rectangular box represents the wire fiducial search area 1010

The middle rectangular frame area is the weighting area 1020 +.>

Cutting out the region to obtain a cutting region with width of +.>

The height of the clipping region is +.>

. Wherein the weighting area 1020

To ensure that the range is not out of range.

By way of example only, since the wire fiducial point to be located is the upper right corner anchor point, the location template structure is selected as shown in fig. 5 b. In some embodiments, the formula

The size of the template is 51.

In some embodiments, the convolving operation with the image according to the selected template may specifically include: and sequentially moving the template window from left to right and from top to bottom, and calculating the weight value at each position, wherein the weights at each position are consistent.

In some embodiments, determining the location of the upper right corner anchor point may include: calculating the maximum value of each weighted value

And the position of each maximum value->

If maximum value->

Indicating that the positioning of the frame line datum point is successful, determining the position corresponding to the maximum value in the cutting area, wherein the position is the position of the positioning point of the upper right corner,

Represents the maximum value of convolution operation, +.>

Represents the corresponding abscissa at maximum, +.>

Indicating the corresponding ordinate at the maximum, but the convolution maximum is greater than the threshold th8, indicating successful positioning, which is the upper right hand corner. />

Is the threshold value of the weighting value at the reference point positioning, e.g., +.>

。

In some embodiments, the template matching in step 6 may specifically include the following procedures:

cutting out an interested region from an image to be matched;

converting the template image and the image to be matched into gray images;

matching is carried out according to a correlation coefficient matching method, and the position and matching degree of the best matching area are obtained;

mapping out the position of the matching unit in the image to be matched, comparing the matching degree with a set threshold value, and if the matching degree is larger than the threshold value th7, indicating that the template matching is successful, otherwise, indicating that the template matching is failed.

In some embodiments, in order to reduce the calculation amount, after the positioning of the frame reference point is successful, the absolute position of the matching unit can be determined according to the relative positions of the frame reference point and the matching area, and in order to prevent the influence of the positioning error of the frame reference point, the absolute position determined according to the frame reference point is given a certain margin to the searching area, namely, the left, right, upper and lower expansion is performed

And carrying out template matching on each pixel point. If the reference point of the frame line is positioned out of positionIf the matching area is out of date, a certain margin is given to the search area by the absolute position of the matching area, i.e. expansion of left, right, up, down>

And carrying out template matching on each pixel point. />

Indicating the number of pixel points for expanding the area when the template matching is performed when the positioning of the frame line datum point is successful, < +.>

Indicating the number of pixel points expanded in the region when the positioning of the frame line datum point fails and performing template matching, < ->

。

In some embodiments, matching according to a correlation coefficient matching method, the obtaining the position and the matching degree of the best matching region may include: and sliding the template image on the image to be matched from left to right and from top to bottom, starting from the left upper corner of the source image, moving the template from left to right and from top to bottom by taking the left upper corner pixel point of the template as a unit, and intercepting the image with the same size as the template from the image to be matched by taking the pixel point as the left upper corner vertex to carry out pixel convolution operation on the image and the template when reaching one pixel point. And in the process of matching the template sliding and the image to be matched, storing a comparison calculation result of the template and the image which is currently intercepted in a matrix until all intercepted images are compared. For example, the correlation coefficient matching method in opencv adopts multiplication operation between a template and an image, so that a larger number indicates a higher matching degree, and 0 indicates the worst matching effect, and the calculation formula is as follows:

Wherein,,

representation templateImage (S)/(S)>

For the image to be matched, +.>

Is a matching value. />

The position corresponding to the maximum value is the position of the most matching area, and the value is the matching degree.

In some embodiments, after the layout identification is completed, the number of each archive can be automatically counted, so that the time and effort consuming situation during manual counting can be avoided.

And 140, performing target area positioning and character recognition on the preprocessed archive image based on the archive type to acquire archive information.

The critical information area to be identified is divided into a fixed unit and a floating unit. The location of the fixation unit is determined directly from the position of the corresponding matching unit at the template recognition maximum matching value calculated in step 130 and the relative position of the fixation area to be recognized. The positioning of the floating unit is determined by the positioning position of the marking unit 621 of the hitching area 620 and the relative positions of the marking unit 621 and the identifying unit 622. The floating unit positioning is mainly directed to the recognition unit 622 of the hitching area 620, and the hitching area 620 includes the recognition unit 621 and the recognition unit 622, and the relative position of the recognition unit 622 to the recognition unit 621 is fixed, but the relative position of the hitching information to the matching unit is not fixed, and is within a certain range. Fig. 6 is a schematic diagram of a hitching area 620 according to some embodiments of the present disclosure, and as shown in fig. 6, if the hitching recognition unit 622 is inaccurate by directly using the frame line reference point or the matching unit, if hitching is performed while there is kanji information, the hitching recognition unit 622 can perform the positioning by hitching kanji, i.e., the recognition unit 621. Both the first matching unit 610 and the identification unit 621 are referred to as matching units.

In some embodiments, the target area positioning of the preprocessed archival image may specifically include the following steps:

(1) For each matching unit, when the matching unit is an identification unit 621, step (2) is entered, otherwise, the next matching unit is judged;

(2) Acquiring the position of a frame line reference point of a template corresponding to the current image (when no reference point or reference point positioning fails, the position of the first matching unit 610) as a reference position;

(3) Calculating the relative position of each identification unit 621 in the template with respect to the reference position or the first matching unit 610;

(4) Acquiring the position of a first matching unit 610 in the current image;

(5) Calculating the search range of the identification unit 621 in the current image by using the template identification process;

(6) Template matching with the image of the identification unit 621 in the range returns the maximum matching value;

(7) When the maximum value is larger than the set threshold value, the template matching is successful, the matching position is returned, and the step (8) is performed; otherwise, the floating unit matching fails, and the absolute position of the floating unit in the template is used as the position of the floating unit in the current image.

(8) Calculating the relative position of the current image to the floating unit in the template relative to the identification unit 621 in the template;

(9) The absolute position of the floating unit in the current image is obtained from the relative position and the absolute position of the identification unit 621 in the current image.

In some embodiments, text recognition may be performed on the preprocessed archival image using a CRNN-based method.

And step 150, generating a warehouse entry label based on the file information.

In some embodiments, the file information is written into the warehouse-in label corresponding to the file image through an RFID desktop reader-writer or an RFID printer, and the warehouse-in label is printed. RFID (Radio Frequency Identification), also known as radio frequency identification, is a communication technology that can identify a specific object by radio signals and read and write related data without establishing mechanical or optical contact between the identification system and the specific object. Through the file type and file information automatically identified in the front, automatic RFID generates some key information containing the file type and file, writes in RFID chip information and prints RFID labels, and the key information is attached to the file to serve as unique electronic identification of the file. The RFID printed label can be used for informationized management of files during inventory, and the accuracy rate of each file can be accurately read in real time during inventory.

FIG. 7 is a block diagram of an archive management system according to some embodiments of the present disclosure, as shown in FIG. 7, an archive management system may include: the device comprises an image acquisition module, a preprocessing module, a type determining module, an information identifying module and a label generating module.

The image acquisition module may be used to acquire archival images.

The preprocessing module can be used for preprocessing the archival image and acquiring the preprocessed archival image.

The type determining module can be used for carrying out layout identification on the preprocessed archival image and determining the type of the archives.

The information identification module can be used for carrying out target area positioning and character identification on the preprocessed archive image based on the archive type to acquire archive information.

The tag generation module may be configured to generate a warehouse entry tag based on the archive information.

For further description of an archive management system, reference may be made to fig. 1 and its related description, and further description is omitted herein.

Fig. 8 is a block diagram of an electronic device, as shown in fig. 8, which is an example of a hardware device that may be applied to aspects of the present invention, according to some embodiments of the present description. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 8, the electronic device includes a computing unit that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) or a computer program loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device may also be stored. The computing unit, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

A plurality of components in an electronic device are connected to an I/O interface, comprising: an input unit, an output unit, a storage unit, and a communication unit. The input unit may be any type of device capable of inputting information to the electronic device, and may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage units may include, but are not limited to, magnetic disks, optical disks. The communication unit allows the electronic device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central Processing Units (CPUs), graphics Processing Units (GPUs), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, and the like. The computing unit performs the various methods and processes described above. For example, in some embodiments, the archive management method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM and/or the communication unit. In some embodiments, the computing unit may be configured to perform the archival management method by any other suitable means (e.g., by means of firmware).

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.

Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.

Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.

Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.

In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. The archive warehouse-in management method is characterized by comprising the following steps of:

collecting archival images;

preprocessing the archival image to obtain a preprocessed archival image;

performing layout identification on the preprocessed archive image to determine the type of the archive;

based on the file type, carrying out target area positioning and character recognition on the preprocessed file image to obtain file information;

and generating a warehouse entry label based on the archive information.

2. A method of archive management as claimed in claim 1, further comprising:

before preprocessing the archival image, performing ambiguity analysis on the archival image, and judging whether to acquire the archival image again.

3. A archival storage management method according to claim 2, wherein the performing ambiguity analysis on the archival image to determine whether to re-acquire the archival image comprises:

Acquiring an area image to be analyzed from the archive image;

determining the overall contrast of the region image to be analyzed;

determining the ambiguity of the region image to be analyzed based on the average ambiguity when the overall contrast of the region image to be analyzed is smaller than an overall contrast threshold;

determining the ambiguity of the region image to be analyzed based on the edge ambiguity distance when the overall contrast of the region image to be analyzed is greater than or equal to an overall contrast threshold;

and judging whether to acquire the archival image again or not based on the ambiguity of the regional image to be analyzed.

4. A archival storage management method according to claim 3, wherein the determining the ambiguity of the image of the area to be analyzed based on the edge ambiguity distance comprises:

determining sharp points of the to-be-analyzed area image;

calculating a fuzzy distance corresponding to each sharp point;

and determining the ambiguity of the image of the area to be analyzed based on the ambiguity distance corresponding to each sharp point.

5. A method of warehousing archives according to any one of claims 1-4, wherein the preprocessing the archival image to obtain a preprocessed archival image includes:

And correcting the deviation, removing black edges and white balance of the archival image to obtain the preprocessed archival image.

6. A method of warehousing archives according to any one of claims 1-4, wherein said performing layout recognition on said preprocessed archival image to determine the type of archives includes:

traversing all template images to perform template matching on the preprocessed archival images, and judging whether template images matched with the preprocessed archival images exist or not;

if the template image matched with the preprocessed archival image does not exist, rotating the preprocessed archival image to generate a rotated archival image, traversing all the template images to perform template matching on the preprocessed archival image, and judging whether the template image matched with the rotated archival image exists or not;

and if the template image matched with the rotated archive image exists, determining the type of the archive based on the matched template image.

7. A archival storage management method according to claim 6, wherein said traversing all template images to perform template matching on the preprocessed archival images, determining whether there is a template image matching the preprocessed archival images, comprises:

sequencing the template images according to the length and the width of the preprocessed archival images;

according to the sequencing result, sequentially positioning frame line datum points of all the template images and the template images;

judging whether a template image matched with the preprocessed archive image exists or not according to a frame line datum point positioning result;

if the template images matched with the preprocessed archive images are judged to be absent according to the frame line reference point positioning result, according to the sorting result, carrying out template identification flow of the frame line reference point-free on all the template images and the template images in sequence;

and judging whether a template image matched with the preprocessed archival image exists or not according to the result of the template identification flow of the reference point without the frame line.

8. A method of warehousing archives according to any one of claims 1-4, wherein the generating a warehousing label based on the archival information includes:

And writing the file information into a warehouse-in label corresponding to the file image through an RFID desktop reader-writer or an RFID printer, and printing the warehouse-in label.

9. A archival storage management system, comprising:

the image acquisition module is used for acquiring archival images;

the preprocessing module is used for preprocessing the archive image and acquiring a preprocessed archive image;

the type determining module is used for carrying out layout identification on the preprocessed archival image and determining the type of the archives;

the information identification module is used for carrying out target area positioning and character identification on the preprocessed archive image based on the archive type to obtain archive information;

and the label generating module is used for generating a warehouse-in label based on the file information.

10. A archival storage management system as claimed in claim 9, wherein the image acquisition module is further configured to:

and carrying out ambiguity analysis on the archival image, and judging whether to acquire the archival image again.