CN110163211A

CN110163211A - A kind of image-recognizing method, device and storage medium

Info

Publication number: CN110163211A
Application number: CN201811037416.5A
Authority: CN
Inventors: 刘东泽; 杨晨; 李�浩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2019-08-23
Anticipated expiration: 2038-09-06
Also published as: CN110163211B

Abstract

The embodiment of the invention discloses a kind of image-recognizing method, device and storage mediums；Sample paper image after acquisition mark of the embodiment of the present invention, which includes the sample answer region marked and the sample anchor point of sample paper；Obtain the positional relationship between sample answer region and sample anchor point；Anchor point identification network model is trained according to sample paper image, the anchor point after being trained identifies network model；Paper image to be identified is acquired, and identifies the locating point position of paper using the anchor point identification network model after training；According to locating point position and positional relationship, answer area image is extracted from paper image；Character recognition is carried out to answer area image, obtains recognition result.The program can promote the accuracy and confidence level of character recognition.

Description

A kind of image-recognizing method, device and storage medium

Technical field

The present invention relates to fields of communication technology, and in particular to a kind of image-recognizing method, device and storage medium.

Background technique

As computer bottom hardware performance is higher and higher, the accumulation of FIELD Data amount the next much, promote artificial intelligence (AI, Artificial Intelligence) technology develops rapidly in each field of every profession and trade.It is had accumulated greatly in education sector at present The test question data of amount, consequent are the great demands to Quick reading, the tradition scheme of going over examination papers be generally basede on student's answer, The mode of teacher's artificial judgment, in largely examination examination paper, tradition is goed over examination papers there are inefficiency, needs artificial screening per together Topic.

Automatic marking mode is based primarily upon optical character identification (OCR, Optical Character at present Recognition) technology takes pictures identification to examination score card by OCR technique, namely to the score card progress word of specific format Symbol identification, then, recognition result and model answer is compared, to realize automatic marking.

In the research and practice process to the prior art, it was found by the inventors of the present invention that in existing scheme, due to adopting Character recognition is carried out with simple OCR technique, original paper can not effectively be identified, be essentially all to need particular bin Answering card of formula etc. auxiliary, and existing character recognition mode can not be suitable for various image taking scenes (such as background, light, Angle, texture etc.) character recognition, and the identification of various topic types can not be adapted to, can only be to some specific topic type such as mouths Arithmetic problem are identified that the limitation of character recognition mode is bigger in existing automatic marking scheme, therefore, character recognition it is accurate Degree and confidence level are lower.

Summary of the invention

The embodiment of the present invention provides a kind of image-recognizing method, device and storage medium, can promote the standard of character recognition Exactness, confidence level.

The embodiment of the present invention provides a kind of image-recognizing method, comprising:

Sample paper image after acquisition mark, the sample paper image includes the sample answer region marked and institute State the sample anchor point of sample paper；

Obtain the positional relationship between sample answer region and the sample anchor point；

Anchor point identification network model is trained according to the sample paper image, the anchor point after being trained is known Other network model；

Paper image to be identified is acquired, and paper is identified using the anchor point identification network model after the training Locating point position；

According to the locating point position and the positional relationship, answer area image is extracted from the paper image；

Character recognition is carried out to the answer area image, obtains recognition result.

The embodiment of the present invention also provides a kind of pattern recognition device, comprising:

Sample collection unit, for acquiring the sample paper image after marking, the sample paper image includes having marked Sample answer region and the sample paper sample anchor point；

Relation acquisition unit, for obtaining the positional relationship between sample answer region and the sample anchor point；

Training unit is instructed for being trained according to the sample paper image to anchor point identification network model Anchor point after white silk identifies network model；

Anchor point recognition unit, for acquiring paper image to be identified, and using the anchor point identification after the training Network model identifies the locating point position of paper；

Area extracting unit, for being mentioned from the paper image according to the locating point position and the positional relationship Take answer area image；

Character recognition unit obtains recognition result for carrying out character recognition to the answer area image.In addition, this Inventive embodiments also provide a kind of storage medium, and the storage medium is stored with a plurality of instruction, described instruction be suitable for processor into Row load, to execute the step in any image-recognizing method provided by the embodiment of the present invention.

The embodiment of the present invention can acquire the sample paper image after mark, which includes the sample marked The sample anchor point in this answer region and sample paper；Obtain the positional relationship between sample answer region and sample anchor point； Anchor point identification network model is trained according to sample paper image, the anchor point after being trained identifies network model； Paper image to be identified is acquired, and identifies the locating point position of paper using the anchor point identification network model after training； According to locating point position and positional relationship, answer area image is extracted from paper image；Character is carried out to answer area image Identification, obtains recognition result；Since the program can identify network model from paper figure by the anchor point based on deep learning Identify therefore the locating point position of paper effectively can accurately identify various image taking scene (such as backgrounds, light as in Line, angle, texture etc.) under the obtained paper locating point position of paper image, the program is suitable for various photographed scenes, to examination The photographed scene of volume image does not have any restrictions requirement；And since the program can also directly carry out effectively original paper Character recognition is not necessarily to any auxiliary, therefore, to paper type, topic type etc. without any restrictions requirement；As it can be seen that the word of the program Symbol identification limitation is smaller (such as to photographed scene, paper type, topic type, there is no limit), improves the accurate of character recognition Degree and confidence level.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 a is the schematic diagram of a scenario of image-recognizing method provided in an embodiment of the present invention；

Fig. 1 b is the flow diagram of image-recognizing method provided in an embodiment of the present invention；

Fig. 1 c is paper image labeling schematic diagram provided in an embodiment of the present invention；

Fig. 1 d is the structural schematic diagram of anchor point identification network model provided in an embodiment of the present invention

Fig. 1 e is affine transformation schematic diagram provided in an embodiment of the present invention；

Fig. 1 f is answer area image schematic diagram provided in an embodiment of the present invention；

Fig. 1 g is the structural schematic diagram of character recognition network model provided in an embodiment of the present invention；

Fig. 2 a is another flow diagram of image-recognizing method provided in an embodiment of the present invention；

Fig. 2 b is the image schematic diagram provided in an embodiment of the present invention shot under different background, texture, angle；

Fig. 2 c is the image schematic diagram provided in an embodiment of the present invention shot under different light；

Fig. 2 d is projection cutting schematic diagram provided in an embodiment of the present invention；

Fig. 2 e is the block schematic illustration of image recognition provided in an embodiment of the present invention；

Fig. 3 a is the first structural schematic diagram of pattern recognition device provided in an embodiment of the present invention；

Fig. 3 b is second of structural schematic diagram of pattern recognition device provided in an embodiment of the present invention；

Fig. 3 c is the third structural schematic diagram of pattern recognition device provided in an embodiment of the present invention；

Fig. 3 d is the 4th kind of structural schematic diagram of pattern recognition device provided in an embodiment of the present invention；

Fig. 4 is the structural schematic diagram of the network equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.

The embodiment of the present invention provides a kind of image-recognizing method, device and storage medium.

Wherein, which specifically can integrate in the network equipment, such as terminal or server equipment, example Such as, with reference to Fig. 1 a, the network equipment can acquire the sample paper image after mark, which includes the sample marked The sample anchor point in this answer region and sample paper, for example, can receive image capture device such as mobile phone, camera device etc. Sample paper image after the mark of transmission；Then, between the network equipment available sample answer region and sample anchor point Positional relationship；Anchor point identification network model is trained according to sample paper image, the anchor point identification after being trained Network model；Paper image to be identified is acquired, and determining for paper is identified using the anchor point identification network model after training Site location；According to locating point position and positional relationship, answer area image is extracted from paper image；To answer area image Character recognition is carried out, recognition result is obtained.

It is described in detail separately below.It should be noted that the serial number of following embodiment is not as preferably suitable to embodiment The restriction of sequence.

In embodiments of the present invention, the angle of pattern recognition device is described, which specifically can be with It is integrated in the network equipment such as terminal or server equipment.

In one embodiment, a kind of image-recognizing method is provided, this method can be executed by the processor of the network equipment, As shown in Figure 1 b, the detailed process of the image-recognizing method can be such that

101, the sample paper image after acquisition mark, the sample paper image include the sample answer region that has marked and The sample anchor point of sample paper.

Wherein, answer region is the region for writing answer in paper for examinee, for example, the underlined region of gap-filling questions, selection The option of topic fills in region etc..In one embodiment, answer region can also be including answer information etc..

Wherein, anchor point is the point for positioning paper region, can be set according to actual needs, for example, can be with paper The vertex in region, for example, anchor point can be four vertex a, b, c, d of paper with reference to Fig. 1 c.

For example, can acquire paper image with reference to Fig. 1 c, then, user such as teacher can mark on paper image and answer Inscribe four vertex a, b of circle mark in region (rectangular box region in such as Fig. 1 c) and the anchor point such as paper of paper, c, D, the sample paper image after being marked in this way.

In some embodiments, the mark of paper image can be realized by the network equipment, for example, the network equipment can acquire Sample paper image (if acquisition is with the image of the sample paper of answer), then, according to the labeling operation of user in sample Answer region and anchor point, the sample paper image after being marked are marked in paper image.

In some embodiments, the mark of paper image can also be realized by other equipment, for example, terminal collecting sample tries Image (if acquisition is with the image of the sample paper of answer) is rolled up, then, according to the labeling operation of user in sample paper figure Answer region is marked as in and the paper image after mark is sent to the network equipment by anchor point, terminal.

102, the positional relationship between sample answer region and sample anchor point is obtained.

Wherein, positional relationship can be position of the sample answer region relative to sample anchor point in sample paper image Relationship.

Specifically, the location information (positional value such as two-dimensional coordinate value) and sample in available sample answer region are fixed The location information (positional value such as two-dimensional coordinate value) in site, then, the location information positioned according to sample answer region and sample Obtain the positional relationship between sample answer region and sample anchor point.

103, anchor point identification network model is trained according to sample paper image, the anchor point after being trained is known Other network model.

Sequential relationship between step 102 and 103 can there are many, do not limited by serial number.

Wherein, anchor point identifies network model for the deep learning nerve net in paper such as paper region in image for identification Network model namely anchor point identification network model are models neural network based.

Wherein, neural network can be with convolutional neural networks (CNN, Convolutional Neural Network).

For using the structure as convolutional neural networks (CNN, Convolutional Neural Network), then such as Fig. 1 d Shown, which may include at least five convolutional layers (Convolution) and full articulamentum (FC, a Fully Connected Layers), as follows:

Convolutional layer: it is mainly used for carrying out feature extraction to the image (such as training sample or the image for needing to identify) of input (initial data is mapped to hidden layer feature space), wherein convolution kernel size can depending on practical application, for example, from First layer convolutional layer to layer 5 convolutional layer convolution kernel size successively can be (64,64), (32,32), (16,16), (8, 8), (4,4)；Optionally, in order to reduce the complexity of calculating, computational efficiency, the convolution kernel size of this five layers of convolutional layers are improved It can be both configured to identical.

Optionally, in order to avoid the problem of interbed data distribution changes in the training process, to prevent gradient from disappearing Or explosion, quickening training speed, normalized can also be added to simplify output or input, in embodiments of the present invention, return One change processing can be " BN (Batch Normalization, batch normalize) ".BN can to the result after each convolution into Row normalized.For example, BN processing can be added in all convolutional layers.It, can be in the first convolutional layer to reference to Fig. 1 d BN is added in all convolutional layers of five convolutional layers.

Optionally, in order to improve the ability to express of model, non-linear factor can also be added by the way that activation primitive is added, In embodiments of the present invention, which is " relu (line rectification function, Rectified Linear Unit) ", and Filling (padding refers to the space between attribute definition element frame and element content) mode is " same ", " same " filling Mode can be simply interpreted as with 0 filling edge, the left side (top) mend as the number of 0 (following) benefit 0 in number and the right or It is one few.

It optionally, can also be in all convolutional layers or any 1~2 layer of progress down-sampling in order to be further reduced calculation amount (pooling) it operates, down-sampling operation is essentially identical with the operation of convolution, and only the convolution kernel of down-sampling is only to take correspondence The maximum value (max pooling) of position or average value (average pooling) etc., for convenience, of the invention real It applies in example, down-sampling operation can be carried out in the 5th convolutional layer of first layer convolutional layer, and down-sampling operation is specially It is illustrated for maxpooling.

It should be noted that for convenience, in embodiments of the present invention, by layer, normalized where activation primitive Layer (such as BN layers) and down-sampling layer (also referred to as pond layer) are included into convolutional layer, it should be appreciated that it is also assumed that the knot Structure includes that layer, down-sampling layer (i.e. pond layer) and full articulamentum where convolutional layer, normalized layer, activation primitive are gone back certainly It may include the input layer for input data and the output layer for output data, therefore not to repeat here.

Full articulamentum: " the distributed nature expression " acquired can be mapped to sample labeling space, in entire convolution The effect of " classifier " is primarily served in neural network, each node of full articulamentum is with upper one layer (in such as convolutional layer Down-sampling layer) all nodes of output are connected, wherein and a node of full articulamentum is a nerve being known as in full articulamentum Member, the quantity of neuron can be depending on the demand of practical application in full articulamentum.It is similar with convolutional layer, optionally, complete In articulamentum, non-linear factor can also be added by the way that activation primitive is added, for example, activation primitive sigmoid can be added (S type function).

Anchor point based on above-mentioned introduction identifies network architecture, and step " is known anchor point according to sample paper image Other network model is trained, and the anchor point after being trained identifies network model ", it can specifically include:

(1), the actual position value of sample anchor point is obtained.

For example, positional value (such as two-dimensional coordinate value) of the sample anchor point of available mark in sample paper image, The positional value is actual position value.For example, with reference to Fig. 1 c, after the paper image for obtaining mark, the vertex of available mark A, the positional value of b, c, d in the picture such as two-dimensional coordinate value.

(2), the predicted position value of sample anchor point is obtained based on sample image and anchor point identification network model.

For example, sample image can be input to anchor point identification network model, anchor point identifies the volume in network model Lamination successively will carry out process of convolution to sample image, then, the processing result that upper layer convolutional layer is exported in full articulamentum into The full connection operation of row, obtains the predicted position value of sample anchor point.

For example, by taking anchor point identification network model includes 5 convolutional layers and 1 full articulamentum as an example, with reference to Fig. 1 d, the One convolutional layer (Conv1) carries out process of convolution, normalized (BN) to the sample image of input, at activation primitive (relu) Reason, down-sampling operation, then, output processing result to second convolutional layer (Conv2)；Upper layer is exported in second convolutional layer Result carry out process of convolution, activation primitive (relu) processing, down-sampling operation output processing result to third convolutional layer (Conv3)；Process of convolution, normalized (BN), activation primitive are carried out to the result that upper layer exports in third convolutional layer (relu) processing, down-sampling operation, and output results to the 4th convolutional layer (Conv4)；It is defeated to upper layer in the 4th convolutional layer Result out carries out process of convolution, activation primitive (relu) processing, down-sampling operation output processing result to the 5th convolutional layer (Conv5)；Process of convolution, normalized (BN), activation letter are carried out to the processing result that upper layer exports in layer 5 convolutional layer Number (relu) processing, down-sampling operation, and output results to full articulamentum (FC)；Finally, upper layer is exported in full articulamentum Convolution processing result carries out the predicted position of full connection operation output anchor point；For example, can be with output vertex a, b, c, d.

(3), it is restrained, is obtained using predicted position value and actual position value of the default loss function to sample anchor point Anchor point after training identifies network model.

Wherein, which can carry out flexible setting according to practical application request, for example, loss function can be based on Euclidean distance between predicted position value and actual position value is set.Specifically, loss function be characterized as predicted position value and Euclidean distance between actual position value is less than preset threshold.

The error between predicted position value and actual position value by reducing anchor point constantly train, with adjustment Weight can obtain model after the training to appropriate value.For example, when between the predicted position value and actual position value of anchor point Euclidean distance when being greater than preset threshold, then continuously adjust weight to appropriate value, until Euclidean distance is less than preset threshold, It can obtain model after the training.

104, paper image to be identified is acquired, and paper is identified using the anchor point identification network model after training Locating point position.

For example, the network equipment can shoot the paper of existing answer, paper image is obtained；For another example, Ke Yitu As the paper image that acquisition equipment is sent, for example, image capture device can shoot the paper of existing answer, then, The paper image that shooting obtains is sent to the network equipment.

Wherein, can be such that based on the anchor point identification network model identification locating point position after training

Anchor point identification network model after paper image can be input to training, anchor point identify in network model Convolutional layer successively will carry out process of convolution to paper image, then, in the processing result that full articulamentum exports upper layer convolutional layer Full connection operation is carried out, the predicted position value of anchor point is obtained.

By taking anchor point identification network model includes a full articulamentum and at least five convolutional layers as an example；Step is " using instruction Anchor point identification network model after white silk identifies the locating point position of paper ", may include:

Process of convolution successively is carried out to paper image at least five convolutional layers, obtains convolution processing result；

Full connection operation is carried out to convolution processing result in full articulamentum, obtains locating point position.

For example, by taking anchor point identification network model includes 5 convolutional layers and 1 full articulamentum as an example, with reference to Fig. 1 d, the One convolutional layer to the paper image of input carry out process of convolution, normalized (BN), activation primitive (relu) processing, under adopt Sample operation, then, output processing result to second convolutional layer；Convolution is carried out to the result that upper layer exports in second convolutional layer Processing, activation primitive (relu) processing, down-sampling operation output processing result to third convolutional layer；In third convolutional layer pair The result of upper layer output carries out process of convolution, normalized (BN), activation primitive (relu) processing, down-sampling operation, and defeated Result is to the 4th convolutional layer out；Process of convolution, activation primitive are carried out to the result that upper layer exports in the 4th convolutional layer (relu) processing, down-sampling operation output processing result to the 5th convolutional layer；In the processing of layer 5 convolutional layer upper layer output As a result process of convolution, normalized (BN), activation primitive (relu) processing, down-sampling operation are carried out, and is outputted results to complete Articulamentum；Finally, the prediction of full connection operation output anchor point is carried out to the convolution processing result that upper layer exports in full articulamentum Positional value.

105, according to locating point position and positional relationship, answer area image is extracted from paper image.

In the embodiment of the present invention, after identifying the locating point position in paper region based on anchor point identification network model, The answer (as cut) can be extracted from paper image based on the positional relationship between locating point position, anchor point and answer region Area image.

Specifically, in one embodiment, answer regional location first can be determined based on locating point position and positional relationship, so Afterwards, answer area image is extracted based on answer regional location；That is, step is " according to locating point position and positional relationship, from paper Answer area image is extracted in image ", may include:

Answer regional location is determined according to locating point position and positional relationship；

Answer area image is extracted from paper image according to answer regional location.

For example, after obtaining four vertex positions in paper region, can be based on so that answer region is rectangular area as an example Positional relationship between vertex position, vertex and answer region (four vertex in such as answer region), determines that answer region is being tried The position (position on four vertex in such as answer region) in image is rolled up, then, goes out to answer from paper image cropping based on the position Area image is inscribed, for example, cutting out rectangular-shaped answer area image.

In one embodiment, it is the accuracy for promoting character recognition, affine transformation can also be carried out to paper image, so as to In can more accurately extract answer region and carry out character recognition.For example, step is " to according to locating point position and position pass System extracts answer area image from paper image ", may include:

Affine transformation is carried out to paper image according to locating point position, it is fixed after image, affine transformation after affine transformation to obtain Site location；

According to locating point position after affine transformation and positional relationship, from answer administrative division map is extracted after affine transformation in image Picture.

Wherein, affine transformation, also known as affine maps, refer in geometry, and a vector space carries out once linear transformation And a translation is connected, it is transformed to another vector space.Affine transformation includes: the operation such as image rotation, translation, scaling.

The affine transformation of image substantially carries out linear transformation to the two-dimensional position of pixel in image, and one arbitrary affine Transformation can be expressed as multiplied by a matrix (linear transformation) then along with a vector (translation).

Since the position of each point (pixel) of image converts, the position of anchor point also can after affine It converts；Therefore, to image carry out affine transformation after available anchor point affine transformation after position, i.e., new position.

In one embodiment, affine transformation can be carried out to the paper region in paper image, for example, in paper image Paper region carry out affine transformation, to form rectangular-shaped paper image；Namely by affine and arrive paper region projection Rectangular area.For example, carrying out the comparison before and after affine transformation for paper image, the left side is affine transformation in Fig. 1 e with reference to Fig. 1 e Preceding paper image, the right are the paper image after affine transformation.

Wherein, the affine transformation of Yao Shixian image, it is necessary to seek affine transformation matrix, the embodiment of the present invention can be based on Locating point position obtains affine transformation matrix, for example, can be obtained based on the current location of anchor point and new position Then affine transformation matrix carries out affine transformation based on the matrix.

For example, step " carrying out affine transformation to paper image according to locating point position ", may include:

Obtain new locating point position；

According to locating point position and new locating point position, affine transformation matrix is obtained；

Affine transformation processing is carried out according to location of pixels of the affine transformation matrix to image.

Wherein, new locating point position can be the position behind the new position of anchor point namely anchor point execution affine transformation It sets；The position can be preset.

For example, can identify that network model obtains by anchor point so that anchor point is four vertex in paper region as an example The current location of vertex a, b, c, d, then, obtain a, b, c, d affine transformation after new position, by this two groups of positions can based on Calculate affine transformation matrix, then, affine transformation carried out to all point in image using affine transformation matrix, with reference to Fig. 1 f and Fig. 1 c, can be from cutting out multiple-choice question answer area image in image after affine transformation.

In one embodiment, after affine transformations, it can be obtained based on locating point position after affine transformation and positional relationship The answer region position in image after affine transformations；Position based on answer region is from extracting answer in image after affine transformation Area image.

For example, by taking answer region is rectangular area as an example, after obtaining four vertex positions in paper region, to paper figure It, can be based on vertex position, vertex and answer region (four tops in such as answer region after affine transformation as carrying out affine transformation Point) between positional relationship, determine answer region (four tops in such as answer region of the position in paper image after affine transformations The position of point), then, answer area image is gone out from affine transformation examination papers image cropping based on the position, for example, cutting out square The answer area image of shape.

106, character recognition is carried out to answer area image, obtains recognition result.

In order to promote the accuracy and confidence level of character recognition, it can be cut into character picture from answer area image, Then, character recognition is carried out to character picture, wherein character picture can be to include one or more character (such as text, symbol Number, number etc.) image.

In one embodiment, in order to promote the cutting efficiency of character picture and cut accuracy, it is also based on projection side Formula is cut into character picture from answer area image, for example, horizontal, upright projection is carried out using openCV, thus from answer Character picture is cut into area image.

In one embodiment, in order to promote the accuracy and confidence level of character recognition, character recognition network mould can be based on Type carries out character recognition, which is character recognition model neural network based.

For example, step " carrying out character recognition to answer area image, obtain recognition result ", may include:

Character picture is cut into from answer area image using projection pattern；

Character recognition is carried out to character picture using the character recognition network model after training, obtains recognition result.

For example, character picture can be cut by the way of floor projection, upright projection to answer area image.In order to The cutting accuracy for promoting character picture, first can be cut into multirow sub-district area image using floor projection；Then, using vertical Projection is cut into character picture from sub-district area image.

That is, step " being cut into character picture from answer area image using projection pattern ", may include:

Floor projection is carried out to area image, obtains floor projection result；

Area image is cut according to floor projection result, obtains several row sub-district area images；

Upright projection is carried out to sub- area image, obtains upright projection result；

Sub- area image is cut according to upright projection result, obtains character picture.

Wherein, floor projection can be with are as follows: the projection of two dimensional image on the y axis；Upright projection can be with are as follows: two dimensional image is in x Projection on axis.

For example, floor projection can be carried out to answer area image shown in Fig. 1 f, can be cut based on floor projection result The sub-district area image for cutting out several rows, for example, including " (c)." sub-district area image, the sub-district area image comprising other characters, Etc..Then, upright projection is being carried out to sub- area image, for example, to " (c) is included." sub-district area image projected, just It is available comprising " (", " c ", ") ", "." character picture.

In one embodiment, in order to promote character picture cutting efficiency and accuracy, several row sub-districts can also obtained Image is filtered after area image, then, filtered image is cut using upright projection.For example, to sub-district Before area image carries out upright projection, present invention method can also include: according to pre-set image filter condition to several Row sub-district area image is filtered, sub-district area image after being filtered；

At this point, step " carrying out upright projection to sub- area image ", may include: to hang down to sub-district area image after filtering Deliver directly shadow；Step " cuts sub- area image according to upright projection result ", may include: according to upright projection result pair Sub-district area image is cut after filtering.

Wherein, pre-set image filter condition can be set according to actual needs, uncorrelated to answer character for filtering out Image.

For example, floor projection can be carried out to answer area image shown in Fig. 1 f, can be cut based on floor projection result The sub-district area image for cutting out several rows, for example, including " (c)." sub-district area image, the sub-district area image comprising other characters, Etc..Then, sub- area image is filtered, for example, filtering out the image etc. comprising improper character；For example, by It includes " (c) that image filtering, which can finally obtain,." sub-district area image.

Then, sub-district area image carries out upright projection after to filtering, for example, to " (c) is included." sub-district area image into Row projection, can be included " (", " c ", ") ", "." character picture.

In one embodiment, character picture can also be filtered after upright projection cuts to obtain character picture, For example, filtering out the image of improper character, the image for filtering out imperfect character etc.；It can specifically set according to actual needs Determine filtering rule.

In the embodiment of the present invention, using projection pattern after being cut into character picture in answer area image, can be with Character recognition is carried out using the character recognition network model after training.The character recognition network model is the depth of character for identification Learning neural network model is spent, which is based on neural network such as convolutional neural networks (CNN, Convolutional Neural Network) model.For example, can be using the model similar to lenet network structure.

For using the structure as convolutional neural networks (CNN, Convolutional Neural Network), then such as Fig. 1 g Shown, which may include at least seven convolutional layers (Convolution) and two full articulamentum (FC, Fully Connected Layers).Specifically, the introduction of convolutional layer and full articulamentum can refer to above content, and details are not described herein again. The embodiment of the present invention can promote the accuracy of character recognition using two layers of full connection.

Specifically, character picture can be input to character recognition network model, then, at least seven convolutional layers successively Process of convolution is carried out to character picture, finally, carrying out character classification processing to convolution processing result in most latter two full articulamentum. For example, step " carrying out character recognition to character picture using the character recognition network model after training ", may include:

Process of convolution successively is carried out to character picture in multiple convolutional layers, obtains convolution processing result；

Character classification processing successively is carried out to convolution processing result in two full articulamentums.

Optionally, in some embodiments, (BN), activation can also be normalized to character picture in convolutional layer At least one of function (relu) processing, down-sampling operation processing, improve identification accuracy.

For example, by taking character recognition network model includes 7 convolutional layers and 1 full articulamentum as an example, with reference to Fig. 1 g, first A convolutional layer (Conv1) carries out process of convolution, then, output processing result to second convolutional layer to the character picture of input (Conv2)；Process of convolution is carried out to the result that upper layer exports in second convolutional layer (Conv2) and exports processing result to third Convolutional layer (Conv3)；Process of convolution is carried out to the result that upper layer exports in third convolutional layer, and outputs results to the 4th volume Lamination (Conv4)；Process of convolution is carried out to the result that upper layer exports in the 4th convolutional layer and exports processing result to the 5th volume Lamination (Conv15)；Process of convolution is carried out to the processing result that upper layer exports in layer 5 convolutional layer and outputs results to the 6th volume Lamination (Conv6)；Process of convolution is carried out to the processing result that upper layer exports in the 6th convolutional layer and outputs results to the 7th convolution Layer (Conv7)；Process of convolution is carried out to the processing result that upper layer exports in the 7th brother's convolutional layer and outputs results to the 7th convolution Layer (Conv7), and export processing result to most latter two full articulamentum and carry out character classification, obtain recognition result.

For example, being " c " by the answer character that above-mentioned character recognition network model can be identified finally.

The embodiment of the present invention can in order to simultaneously balance quality and accuracy rate, using be similar to lenet structure network mould Type carries out character recognition.It can be classified for 10000+ characters such as common Chinese character, character, numbers, also, complete using two layers Connection, accuracy rate are guaranteed.

Image-recognizing method provided in an embodiment of the present invention can be applied in automatic marking scene, for example, passing through this hair Bright embodiment image-recognizing method identifies the answer character in paper, then, the answer character that will identify that and model answer Character compares, and corresponding score is provided based on comparing result, to realize automatic marking.

From the foregoing, it will be observed that the sample paper image after acquisition mark of the embodiment of the present invention, sample paper image include having marked Sample answer region and sample paper sample anchor point；It closes the position obtained between sample answer region and sample anchor point System；Anchor point identification network model is trained according to sample paper image, the anchor point after being trained identifies network mould Type；Paper image to be identified is acquired, and identifies the positioning point of paper using the anchor point identification network model after training It sets；According to locating point position and positional relationship, answer area image is extracted from paper image；Word is carried out to answer area image Symbol identification, obtains recognition result；The program can identify the positioning of network model identification paper based on the anchor point of deep learning Then answer region is extracted based on locating point position, and character recognition is carried out to answer area image in point position.Due to the party Case can identify that network model identifies the locating point position of paper from paper image by the anchor point based on deep learning, because This, effectively can accurately identify the paper obtained under various image taking scenes (such as background, light, angle, texture) The paper locating point position of image, the program are suitable for various photographed scenes, do not have any limit to the photographed scene of paper image System requires；And the program directly can also carry out effective character recognition to original paper, any auxiliary is not necessarily to, to paper class Type, topic type etc. are without any restrictions requirement；As it can be seen that the character recognition limitation of the program is smaller (such as to photographed scene, paper class There is no limit for type, topic type etc.), improve the accuracy and confidence level of character recognition.

The embodiment of the present invention uses projection pattern also to realize that character picture is cut, and can promote the essence of character picture cutting Parasexuality, reliability, robustness and efficiency.

In addition, the embodiment of the present invention is also based on deep learning network to carry out character recognition to character picture, it can be with Further promote the accuracy rate of character recognition.

Citing, is described in further detail by the method according to described in above-described embodiment below.

In the present embodiment, it will be illustrated so that the pattern recognition device specifically integrates in the network device as an example.

The image recognition process of the network equipment is as shown in Figure 2 a, as follows:

201, the sample paper image after network equipment acquisition mark.

Wherein, the sample paper image after mark includes the sample positioning of the sample answer region and sample paper marked Point.

Wherein, sample paper image is the image of sample paper, for example, the examination examination of the image of maths exam paper, Chinese language Image of volume etc..

The network equipment can acquire the sample paper image after mark by multiple approach.For example, the network equipment can be right Sample paper is shot, and sample paper image is obtained；Then, it is marked in sample paper image according to the labeling operation of user The anchor point of sample answer region and sample paper.

For another example, the sample paper image after can receive the mark of image capture device transmission, image capture device is such as Terminal can shoot sample paper, obtain sample paper image；Then, according to the labeling operation of user in sample paper The anchor point of sample answer region and sample paper is marked in image；Then the paper figure after sending mark to the network equipment Picture.

In practical applications, if to realize automatic marking using the embodiment of the present invention, teacher can be in numerous examinations One is selected in volume with the paper of answer as sample paper, then, shoots the image of the sample paper, and flat in mark Platform or equipment are labeled the image of the sample paper, such as mark answer region and paper zone location point.

202, the network equipment obtains the positional relationship between sample answer region and sample anchor point；And according to sample paper Image is trained anchor point identification network model, and the anchor point after being trained identifies network model.

Wherein, positional relationship can be that sample answer region (is such as tried relative to sample anchor point in sample paper image Roll up vertex) positional relationship.

Specifically, the location information (positional value such as two-dimensional coordinate value) in the network equipment available sample answer region, with And then the location information (positional value such as two-dimensional coordinate value) of sample anchor point is positioned according to sample answer region and sample Location information obtains the positional relationship between sample answer region and sample anchor point.

Wherein, which may include the position mapping relations, etc. between anchor point and answer region, for example, It can be a function.

For using the structure as convolutional neural networks (CNN, Convolutional Neural Network), then such as Fig. 1 d Shown, which may include at least five convolutional layers (Convolution) and full articulamentum (FC, a Fully Connected Layers).Network structure specifically can refer to the description of above-described embodiment.

Anchor point based on above-mentioned introduction identifies network architecture, and anchor point identifies that the training process of network model can be with It is as follows:

(1), the actual position value of sample anchor point is obtained.

203, the network equipment acquires paper image to be identified.

Such as, wherein paper image is identical two papers with examination question in primary examination with sample paper image, than It such as, can be the identical paper of two examination questions in the examination of primary Chinese language.

Since the embodiment of the present invention identifies anchor point using the network model of deep learning, in the embodiment of the present invention Paper image can be the paper image shot under various scenes；For example, under various shooting angle, light, background, texture The paper image of shooting.That is, the embodiment of the present invention can support the paper image shot under various scenes, than Such as, the shooting under any angle, light, background, texture is supported to obtain image.With reference to Fig. 2 b, in a variety of backgrounds, multiple angles The paper image of lower shooting；It is the paper image shot under different light with reference to Fig. 2 c.

In one embodiment, the network equipment can directly acquire paper image to be identified, can also be adopted by other images Collection equipment acquisition is sent to the network equipment.

204, the network equipment identifies the locating point position of paper using the anchor point identification network model after training.

For example, the network equipment successively can carry out process of convolution to paper image at least five convolutional layers, convolution is obtained Processing result；Full connection operation is carried out to convolution processing result in full articulamentum, obtains locating point position.

205, the network equipment carries out affine transformation to paper image according to locating point position, obtain image after affine transformation, Locating point position after affine transformation.

For example, the network equipment can carry out affine transformation to the paper region in paper image, to form rectangular-shaped examination Roll up image；Namely by affine and by paper region projection to rectangular area.For example, being imitated with reference to Fig. 1 e for paper image The comparison of transformation front and back is penetrated, the left side is the paper image before affine transformation in Fig. 1 e, and the right is the paper image after affine transformation.

For example, the available new locating point position of the network equipment；According to locating point position and new locating point position, obtain Take affine transformation matrix；Affine transformation processing is carried out according to location of pixels of the affine transformation matrix to image.Wherein, new positioning Point position can be the position behind the new position of anchor point namely anchor point execution affine transformation；The position can be preset.

206, the network equipment is according to locating point position after affine transformation and positional relationship, from extracting in image after affine transformation Answer area image.

For example, the network equipment can obtain answer region affine based on locating point position after affine transformation and positional relationship Position after transformation in image；Position based on answer region from after affine transformation in image extract answer area image.

207, the network equipment is cut into character picture from answer area image using projection pattern.

In order to promote the cutting accuracy of character picture, the network equipment first can be cut into multirow sub-district using floor projection Area image；Then, character picture is cut into from sub-district area image using upright projection.

Specifically, with reference to Fig. 2 d, the network equipment carries out floor projection to area image, obtains floor projection result；According to Floor projection result cuts area image, obtains several row sub-district area images；Then, to several row sub-district area images into Row filtering carries out upright projection to sub-district area image after filtering, obtains upright projection result；According to upright projection result to filtering Sub-district area image is cut afterwards, obtains character picture.

For example, floor projection can be carried out to answer area image shown in Fig. 1 f, can be cut based on floor projection result The sub-district area image for cutting out several rows, for example, including " (c)." sub-district area image, the sub-district area image comprising other characters, Etc..Then, sub- area image is filtered, for example, filtering out the image etc. comprising improper character；For example, by It includes " (c) that image filtering, which can finally obtain,." sub-district area image.Then, sub-district area image carries out vertically after to filtering Projection, for example, to " (c) is included." sub-district area image projected, can be included " (", " c ", ") ", "." word Accord with image.

208, the network equipment carries out character recognition to character picture using the character recognition network model after training, is known Other result.

The character recognition network model is the deep learning neural network model of character for identification, which is based on mind Model through network such as convolutional neural networks (CNN, Convolutional Neural Network).For example, class can be used It is similar to the model of lenet network structure.

As shown in Figure 1 g, which may include at least seven convolutional layers (Convolution) and two full articulamentums (FC, Fully Connected Layers).Specifically, the introduction of convolutional layer and full articulamentum can refer to above content, this Place repeats no more.The embodiment of the present invention can promote the accuracy of character recognition using two layers of full connection.

Specifically, character picture can be input to character recognition network model by the network equipment, then, in volume at least seven Lamination successively carries out process of convolution to character picture, finally, carrying out character to convolution processing result in most latter two full articulamentum Classification processing.

According to foregoing description, with reference to Fig. 2 e, the embodiment of the invention also provides a kind of frames of character recognition, can wrap It includes: vertex locating module, Slant Rectify module, Character segmentation module, projection module, character recognition module, externally service encapsulation Module, etc..

Wherein, vertex locating module, for identifying the locating point position of paper based on anchor point identification network model, than Such as, vertex position is identified by network model shown in Fig. 1 d.

Cut oblique rectification module, for carrying out affine transformation to paper image, can paper image to various angles carry out Subsequent identification is convenient in correction.

Character segmentation module, for cutting out answer area image in paper image, (specifically cutting method can join Examine the description of above-described embodiment), and character picture is cut into from answer region using projection pattern.

Projection module, for projecting to cutting out answer area image, such as floor projection, upright projection, from And it is cut into character picture.

Character recognition module, for carrying out character recognition to character picture using the character recognition network model after training, Obtain recognition result.For example, carrying out character recognition using the network model for being similar to lenet structure.Common Chinese character, word can be directed to The 10000+ character such as symbol, number is classified.

Externally service package module calls image provided in an embodiment of the present invention to know for the interface externally serviced for the external world Other method.

In Fig. 2 e, can also include data platform (provide data such as image data), labeled data (mark answer region, Anchor point etc.), preprocessed data (such as picture size, color adjust), the data handling procedures such as data standard construction.

In addition, it can include algorithm platform, CNN core frame, a large amount of to adjust ginseng optimization, the more excellent frame of searching etc..It is logical Required character recognition network model, anchor point identification network model can be constructed by crossing aforementioned a series of operation.

From the foregoing, it will be observed that present invention method can identify that network model identifies paper based on the anchor point of deep learning Locating point position, then, based on locating point position extract answer region, and to answer area image carry out character recognition.By Identify that network model identifies the locating point position of paper from paper image by the anchor point based on deep learning in the program, Therefore, the examination obtained under various image taking scenes (such as background, light, angle, texture) effectively can be accurately identified The paper locating point position of image is rolled up, the program is suitable for various photographed scenes, not any to the photographed scene of paper image Limitation requires；And the program directly can also carry out effective character recognition to original paper, any auxiliary is not necessarily to, to paper Type, topic type etc. are without any restrictions requirement；As it can be seen that the character recognition limitation of the program is smaller (such as to photographed scene, paper There is no limit for type, topic type etc.), improve the accuracy and confidence level of character recognition

In addition, the embodiment of the present invention can also carry out affine transformation to image, can image to various sharping degree into Row correction, is convenient for character recognition, improves the accuracy and efficiency of character recognition.

In addition, the embodiment of the present invention can also realize that character picture is cut using projection pattern, character figure can be promoted As accuracy, reliability, robustness and the efficiency of cutting.

In order to better implement above method, the embodiment of the present invention also provides a kind of pattern recognition device, the image recognition Device specifically can integrate in the network equipment such as terminal or server equipment, the terminal may include mobile phone, tablet computer, The equipment such as laptop or PC.

For example, as shown in Figure 3a, which may include sample collection unit 301, Relation acquisition unit 302, training unit 303, anchor point recognition unit 304, area extracting unit 305 and character recognition unit 306, as follows:

Sample collection unit 301, for acquiring the sample paper image after marking, the sample paper image includes having marked The sample anchor point in the sample answer region of note and the sample paper；

Relation acquisition unit 302 is closed for obtaining the position between sample answer region and the sample anchor point System；

Training unit 303 is obtained for being trained according to the sample paper image to anchor point identification network model Anchor point after training identifies network model；

Anchor point recognition unit 304 is known for acquiring paper image to be identified, and using the anchor point after the training Other network model identifies the locating point position of paper；

Area extracting unit 305 is used for according to the locating point position and the positional relationship, from the paper image Extract answer area image；

Character recognition unit 306 obtains recognition result for carrying out character recognition to the answer area image.

In one embodiment, with reference to Fig. 3 b, the training unit 303 may include:

Position acquisition subelement 3031, for obtaining the actual position value of the sample anchor point；

Predicted value obtains subelement 3032, described in being obtained based on the sample image and anchor point identification network model The predicted position value of sample anchor point；

Restrain subelement 3033, for using default loss function to the predicted position value of the sample anchor point and true Positional value is restrained, and the anchor point after being trained identifies network model.

In one embodiment, the area extracting unit 305, can be specifically used for:

Answer regional location is determined according to the locating point position and the positional relationship；

Answer area image is extracted from the paper image according to the answer regional location.

In one embodiment, with reference to Fig. 3 c, area extracting unit 305 may include:

Affine transformation subelement 3051, for carrying out affine transformation to the paper image according to the locating point position, Obtain after affine transformation locating point position after image, affine transformation；

Extracted region subelement 3052, for according to locating point position after the affine transformation and the positional relationship, from Answer area image is extracted after the affine transformation in image.

Wherein, affine transformation subelement 3051 can be specifically used for:

Obtain new locating point position；

According to the locating point position and the new locating point position, affine transformation matrix is obtained；

Affine transformation processing is carried out according to location of pixels of the affine transformation matrix to described image.

In one embodiment, with reference to Fig. 3 d, the character recognition unit 306, comprising:

Subelement 3061 is cut, for being cut into character picture from the answer area image using projection pattern；

Character recognition subelement 3062, for being carried out using the character recognition network model after training to the character picture Character recognition obtains recognition result.

In one embodiment, subelement 3061 is cut, can be specifically used for:

Floor projection is carried out to the area image, obtains floor projection result；

The area image is cut according to the floor projection result, obtains several row sub-district area images；

Upright projection is carried out to the sub-district area image, obtains upright projection result；

The sub-district area image is cut according to the upright projection result, obtains character picture.

In one embodiment, cutting subelement 3061 is used for:

The area image is cut according to the floor projection result, obtains several row sub-district area images

Several row sub-district area images are filtered according to pre-set image filter condition, sub-district area image after being filtered；

Upright projection is carried out to sub-district area image after the filtering；

Sub-district area image after the filtering is cut according to the upright projection result, obtains character picture

In one embodiment, the anchor point identification network model includes a full articulamentum and at least five convolutional layers； Anchor point recognition unit 304, for acquiring paper image to be identified, at least five convolutional layers successively to the paper image Process of convolution is carried out, convolution processing result is obtained；Full connection operation is carried out to the convolution processing result in the full articulamentum, Obtain locating point position.

In one embodiment, the character recognition network model includes: multiple convolutional layers and two full articulamentums；Character It identifies subelement 3062, can be specifically used for:

Process of convolution successively is carried out to the character picture in the multiple convolutional layer, obtains convolution processing result；Institute It states two full articulamentums and character classification processing successively is carried out to the convolution processing result.

When it is implemented, above each unit can be used as independent entity to realize, any combination can also be carried out, is made It is realized for same or several entities, the specific implementation of above each unit can be found in the embodiment of the method for front, herein not It repeats again.

From the foregoing, it will be observed that the pattern recognition device of the present embodiment passes through the sample examination after the acquisition mark of sample collection unit 301 Image is rolled up, the sample paper image includes the sample anchor point in the sample answer region marked and the sample paper；By Relation acquisition unit 302 obtains the positional relationship between sample answer region and the sample anchor point；By training unit 303 are trained anchor point identification network model according to the sample paper image, and the anchor point after being trained identifies net Network model；Paper image to be identified is acquired by anchor point recognition unit 304, and net is identified using the anchor point after the training Network model identifies the locating point position of paper；It is closed by area extracting unit 305 according to the locating point position and the position System extracts answer area image from the paper image；Word is carried out to the answer area image by character recognition unit 306 Symbol identification, obtains recognition result.Since the program can identify network model from paper by the anchor point based on deep learning In image identify paper locating point position, therefore, effectively can accurately identify various image taking scenes (such as background, Light, angle, texture etc.) under the obtained paper locating point position of paper image, the program is suitable for various photographed scenes, right The photographed scene of paper image does not have any restrictions requirement；And since the program can also directly carry out effectively original paper Character recognition, be not necessarily to any auxiliary, therefore, to paper type, topic type etc. without any restrictions require；As it can be seen that the program Character recognition limitation is smaller (such as to photographed scene, paper type, topic type, there is no limit), improves the standard of character recognition Exactness and confidence level.

The embodiment of the present invention also provides a kind of network equipment, which can be the equipment such as server or terminal.Such as Shown in Fig. 4, it illustrates the structural schematic diagrams of the network equipment involved in the embodiment of the present invention, specifically:

The network equipment may include one or more than one processing core processor 401, one or more The components such as memory 402, power supply 403 and the input unit 404 of computer readable storage medium.Those skilled in the art can manage It solves, network equipment infrastructure shown in Fig. 4 does not constitute the restriction to the network equipment, may include more more or fewer than illustrating Component perhaps combines certain components or different component layouts.Wherein:

Processor 401 is the control centre of the network equipment, utilizes various interfaces and connection whole network equipment Various pieces by running or execute the software program and/or module that are stored in memory 402, and are called and are stored in Data in reservoir 402 execute the various functions and processing data of the network equipment, to carry out integral monitoring to the network equipment. Optionally, processor 401 may include one or more processing cores；Preferably, processor 401 can integrate application processor and tune Demodulation processor processed, wherein the main processing operation system of application processor, user interface and application program etc., modulatedemodulate is mediated Reason device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401 In.

Memory 402 can be used for storing software program and module, and processor 401 is stored in memory 402 by operation Software program and module, thereby executing various function application and data processing.Memory 402 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data area, which can be stored, uses created number according to the network equipment According to etc..In addition, memory 402 may include high-speed random access memory, it can also include nonvolatile memory, such as extremely A few disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also wrap Memory Controller is included, to provide access of the processor 401 to memory 402.

The network equipment further includes the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power management System and processor 401 are logically contiguous, to realize management charging, electric discharge and power managed etc. by power-supply management system Function.Power supply 403 can also include one or more direct current or AC power source, recharging system, power failure monitor The random components such as circuit, power adapter or inverter, power supply status indicator.

The network equipment may also include input unit 404, which can be used for receiving the number or character of input Information, and generate keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal Input.

Although being not shown, the network equipment can also be including display unit etc., and details are not described herein.Specifically in the present embodiment In, the processor 401 in the network equipment can be corresponding by the process of one or more application program according to following instruction Executable file be loaded into memory 402, and the application program being stored in memory 402 is run by processor 401, It is as follows to realize various functions:

Sample paper image after acquisition mark, the sample paper image includes the sample answer region marked and institute State the sample anchor point of sample paper；Obtain the positional relationship between sample answer region and the sample anchor point；Root Anchor point identification network model is trained according to the sample paper image, the anchor point after being trained identifies network mould Type；Paper image to be identified is acquired, and identifies the positioning of paper using the anchor point identification network model after the training Point position；According to the locating point position and the positional relationship, answer area image is extracted from the paper image；To institute It states answer area image and carries out character recognition, obtain recognition result.

For example, the actual position value of the specific available sample anchor point；Based on the sample image and anchor point Identification network model obtains the predicted position value of the sample anchor point；Using default loss function to the sample anchor point Predicted position value and actual position value are restrained, and the anchor point after being trained identifies network model.

In another example carrying out affine transformation to the paper image according to the locating point position, scheme after obtaining affine transformation Locating point position after picture, affine transformation；According to locating point position after the affine transformation and the positional relationship, from described affine Answer area image is extracted after transformation in image.

In another example being cut into character picture from the answer area image using projection pattern；Using the word after training Symbol identification network model carries out character recognition to the character picture, obtains recognition result.

Wherein, which identifies network model, the structure of character recognition network model specifically may refer to the reality of front Example is applied, details are not described herein.

The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.

From the foregoing, it will be observed that the network equipment of the present embodiment can acquire the sample paper image after mark, the sample paper Image includes the sample anchor point in the sample answer region marked and the sample paper；Obtain sample answer region with Positional relationship between the sample anchor point；Anchor point identification network model is instructed according to the sample paper image Practice, the anchor point after being trained identifies network model；Paper image to be identified is acquired, and using the positioning after the training Point identification network model identifies the locating point position of paper；According to the locating point position and the positional relationship, from described Answer area image is extracted in paper image；Character recognition is carried out to the answer area image, obtains recognition result；Due to this Scheme can identify that network model identifies the locating point position of paper from paper image by the anchor point based on deep learning, Therefore, the examination obtained under various image taking scenes (such as background, light, angle, texture) effectively can be accurately identified The paper locating point position of image is rolled up, the program is suitable for various photographed scenes, not any to the photographed scene of paper image Limitation requires；And since the program directly can also carry out effective character recognition to original paper, it is not necessarily to any auxiliary, because This, to paper type, topic type etc. without any restrictions requirement；As it can be seen that the character recognition limitation of the program is smaller (such as to shooting There is no limit for scene, paper type, topic type etc.), improve the accuracy and confidence level of character recognition.

It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one In storage media, and is loaded and executed by processor.

For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed Device is loaded, to execute the step in any image-recognizing method provided by the embodiment of the present invention.For example, the instruction can To execute following steps:

Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory Body (RAM, Random Access Memory), disk or CD etc..

By the instruction stored in the storage medium, any image provided by the embodiment of the present invention can be executed and known Step in other method, it is thereby achieved that achieved by any image-recognizing method provided by the embodiment of the present invention Beneficial effect is detailed in the embodiment of front, and details are not described herein.

It is provided for the embodiments of the invention a kind of image-recognizing method, device and storage medium above and has carried out detailed Jie It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only It is to be used to help understand method and its core concept of the invention；Meanwhile for those skilled in the art, according to the present invention Thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as Limitation of the present invention.

Claims

1. a kind of image-recognizing method characterized by comprising

Sample paper image after acquisition mark, the sample paper image includes the sample answer region marked and the sample The sample anchor point of this paper；

Anchor point identification network model is trained according to the sample paper image, the anchor point after being trained identifies net Network model；

Paper image to be identified is acquired, and identifies the positioning of paper using the anchor point identification network model after the training Point position；

2. image-recognizing method as described in claim 1, which is characterized in that known according to the sample paper image to anchor point Other network model is trained, and the anchor point after being trained identifies network model, comprising:

Obtain the actual position value of the sample anchor point；

The predicted position value of the sample anchor point is obtained based on the sample image and anchor point identification network model；

It is restrained, is trained using predicted position value and actual position value of the default loss function to the sample anchor point Anchor point afterwards identifies network model.

3. image-recognizing method as described in claim 1, which is characterized in that closed according to the locating point position and the position System extracts answer area image from the paper image, comprising:

4. image-recognizing method as described in claim 1, which is characterized in that according to the locating point position and the position Relationship extracts answer area image from the paper image, comprising:

Affine transformation is carried out to the paper image according to the locating point position, obtains image, affine transformation after affine transformation Locating point position afterwards；

According to locating point position after the affine transformation and the positional relationship, from extracting answer in image after the affine transformation Area image.

5. image-recognizing method as described in claim 1, which is characterized in that carry out character knowledge to the answer area image Not, recognition result is obtained, comprising:

Character picture is cut into from the answer area image using projection pattern；

Character recognition is carried out to the character picture using the character recognition network model after training, obtains recognition result.

6. image-recognizing method as claimed in claim 5, which is characterized in that use projection pattern from the answer area image In be cut into character picture, comprising:

7. image-recognizing method as claimed in claim 6, which is characterized in that carrying out upright projection to the sub-district area image Before, the method also includes:

Upright projection is carried out to the sub-district area image, comprising: upright projection is carried out to sub-district area image after the filtering；

The sub-district area image is cut according to the upright projection result, comprising: according to the upright projection result pair Sub-district area image is cut after the filtering.

8. image-recognizing method as described in claim 1, which is characterized in that the anchor point identification network model includes one Full articulamentum and at least five convolutional layers；

The locating point position of paper is identified using the anchor point identification network model after the training, comprising:

Process of convolution successively is carried out to the paper image at least five convolutional layers, obtains convolution processing result；

Full connection operation is carried out to the convolution processing result in the full articulamentum, obtains locating point position.

9. image-recognizing method as claimed in claim 5, which is characterized in that the character recognition network model includes: multiple Convolutional layer and two full articulamentums；

Character recognition is carried out to the character picture using the character recognition network model after training, comprising:

Process of convolution successively is carried out to the character picture in the multiple convolutional layer, obtains convolution processing result；

Character classification processing successively is carried out to the convolution processing result in described two full articulamentums.

10. image-recognizing method as described in claim 1, which is characterized in that according to the locating point position to the paper Image carries out affine transformation, comprising:

Obtain new locating point position；

11. a kind of pattern recognition device characterized by comprising

Sample collection unit, for acquiring the sample paper image after marking, the sample paper image includes the sample marked The sample anchor point in this answer region and the sample paper；

Training unit, for being trained according to the sample paper image to anchor point identification network model, after being trained Anchor point identify network model；

Anchor point recognition unit identifies network for acquiring paper image to be identified, and using the anchor point after the training Model identifies the locating point position of paper；

Area extracting unit, for extracting and answering from the paper image according to the locating point position and the positional relationship Inscribe area image；

Character recognition unit obtains recognition result for carrying out character recognition to the answer area image.

12. pattern recognition device as claimed in claim 11, which is characterized in that the training unit, comprising:

Position acquisition subelement, for obtaining the actual position value of the sample anchor point；

Predicted value obtains subelement, for obtaining the sample positioning based on the sample image and anchor point identification network model The predicted position value of point；

Restrain subelement, for using default loss function to the predicted position value of the sample anchor point and actual position value into Row convergence, the anchor point after being trained identify network model.

13. pattern recognition device as claimed in claim 11, which is characterized in that the area extracting unit, comprising:

Affine transformation subelement obtains affine for carrying out affine transformation to the paper image according to the locating point position Locating point position after image, affine transformation after transformation；

Extracted region subelement, for according to locating point position after the affine transformation and the positional relationship, from described affine Answer area image is extracted after transformation in image.

14. pattern recognition device as claimed in claim 11, which is characterized in that the character recognition unit, comprising:

Subelement is cut, for being cut into character picture from the answer area image using projection pattern；

Character recognition subelement, for carrying out character knowledge to the character picture using the character recognition network model after training Not, recognition result is obtained.

15. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor It is loaded, the step in 1 to 10 described in any item image-recognizing methods is required with perform claim.