CN115273102A - Method, device, equipment and medium for grading handwritten text neatness - Google Patents

Method, device, equipment and medium for grading handwritten text neatness Download PDF

Info

Publication number
CN115273102A
CN115273102A CN202210891472.5A CN202210891472A CN115273102A CN 115273102 A CN115273102 A CN 115273102A CN 202210891472 A CN202210891472 A CN 202210891472A CN 115273102 A CN115273102 A CN 115273102A
Authority
CN
China
Prior art keywords
handwritten text
line
score
image
handwritten
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210891472.5A
Other languages
Chinese (zh)
Inventor
王翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xingtong Technology Co ltd
Original Assignee
Shenzhen Xingtong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xingtong Technology Co ltd filed Critical Shenzhen Xingtong Technology Co ltd
Priority to CN202210891472.5A priority Critical patent/CN115273102A/en
Publication of CN115273102A publication Critical patent/CN115273102A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Character Discrimination (AREA)

Abstract

The disclosure relates to a method, a device, equipment and a medium for grading the neatness of a handwritten text. Acquiring an image to be evaluated, wherein the image to be evaluated comprises handwriting paper written with at least one line of handwriting text; detecting the image to be scored based on a pre-trained detection model and generating a detection result, wherein the detection result comprises the vertex position of the handwriting paper and the position of each line of the handwriting text; determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text; identifying each line of handwritten text in the image to be scored through a pre-trained identification model based on the position of each line of handwritten text, and generating a second score according to the identification result of each line of handwritten text; generating a third score according to the detection result and the recognition result of each line of handwritten text based on a pre-trained scoring model; and according to the obtained scores, the work degree of the handwritten text in the image to be scored can be scored accurately.

Description

Method, device, equipment and medium for grading handwritten text neatness
Technical Field
The present disclosure relates to the field of education, and in particular, to a method, an apparatus, a device, and a medium for scoring a handwritten text neatness.
Background
With the rapid development of artificial intelligence technology, an intelligent question-judging mode based on artificial intelligence is gradually applied to the field of education, for example, a user uses a terminal device to photograph writing paper after writing an answer, an image generated by photographing is uploaded to a related application program for photographing a question, then the application program sends the image into a question type judging model, the wrong judgment and correction of the question content in the image are carried out, and a judgment result is output.
However, for any written content, in addition to the accuracy of writing, the degree of writing is also a focus of attention, and the degree of writing affects the writing habit of the writer and the cultivation of the writing attitude, and therefore, it is important to accurately evaluate the degree of writing of a text in an image.
Disclosure of Invention
In order to solve the technical problem, the disclosure provides a method, a device, equipment and a medium for scoring the regularity of a handwritten text, which can rapidly and accurately score the regularity of the handwritten text in an image.
According to an aspect of the present disclosure, there is provided a method for scoring a handwritten text regularity, including:
acquiring an image to be evaluated, wherein the image to be evaluated comprises handwriting paper written with at least one line of handwriting text;
detecting the image to be scored based on a pre-trained detection model and generating a detection result, wherein the detection result comprises the vertex position of the handwriting paper and the position of each line of handwriting text in the at least one line of handwriting text;
determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text in the at least one line of handwritten text;
identifying each line of handwritten text in the image to be scored through a pre-trained identification model based on the position of each line of handwritten text in the at least one line of handwritten text, and generating a second score according to the identification result of each line of handwritten text;
generating a third score according to the detection result and the recognition result of each line of handwritten text based on a pre-trained scoring model;
and obtaining the finishing degree score of the handwritten text according to the first score, the second score and the third score.
According to another aspect of the present disclosure, there is provided a handwritten text neatness scoring apparatus including:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring an image to be scored, and the image to be scored comprises handwriting paper written with at least one line of handwriting text;
the detection unit is used for detecting the image to be scored based on a pre-trained detection model and generating a detection result, wherein the detection result comprises the vertex position of the handwriting paper and the position of each line of handwriting text in the at least one line of handwriting text;
the first scoring unit is used for determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text in the at least one line of handwritten text;
the second scoring unit is used for recognizing each line of handwritten text in the image to be scored through a pre-trained recognition model based on the position of each line of handwritten text in the at least one line of handwritten text and generating a second score according to the recognition result of each line of handwritten text;
the third scoring unit is used for generating a third score according to the detection result and the recognition result of each line of handwritten text based on a pre-trained scoring model;
and the fourth scoring unit is used for obtaining the finishing degree score of the handwritten text according to the first score, the second score and the third score.
According to another aspect of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory storing a program, wherein the program includes instructions that, when executed by the processor, cause the processor to perform a scoring method according to the handwritten text regularity.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a scoring method according to a handwritten text regularity.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above scoring method for handwritten text regularity.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the method provided by the present disclosure comprises: acquiring an image to be evaluated, wherein the image to be evaluated comprises handwriting paper written with at least one line of handwriting text; detecting the image to be scored based on a pre-trained detection model and generating a detection result, wherein the detection result comprises the vertex position of the handwriting paper and the position of each line of the handwriting text; determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text; identifying each line of handwritten text in the image to be scored through a pre-trained identification model based on the position of each line of handwritten text, and generating a second score according to the identification result of each line of handwritten text; generating a third score according to the detection result and the recognition result of each line of handwritten text based on a pre-trained scoring model; and according to the obtained scores, the work degree of the handwritten text in the image to be scored can be scored accurately.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present disclosure;
FIG. 2 is a flow chart of a model training method provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a writing sample image provided by an embodiment of the present disclosure;
FIG. 4 is a diagram of a network structure of a model provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart of a model training method provided by an embodiment of the present disclosure;
fig. 6 is a flowchart of a method for scoring the regularity of a handwritten text according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a handwritten text finishing scoring device according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure can be more clearly understood, embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
At present, an intelligent judgment method based on artificial intelligence is suitable for subjects which can be logically corrected, such as subjects for primary school mathematics calculation, and has obtained a good correction effect based on the intelligent judgment method, and subjects which cannot be logically corrected, such as selection subjects, application subjects, drawing subjects, connection subjects and the like, can also be corrected by subject bank comparison or a natural language processing method, and has also obtained a good effect, so far, good full correction can be realized, which can effectively reduce the correction burden of parents and teachers, but the method adopting subject bank comparison needs to establish a special subject bank, and the cost for establishing the subject bank is higher, and the realization difficulty is higher and is difficult to realize by using the natural language processing method. Secondly, for any homework, the judgment of right and wrong is a main aspect so that students can master knowledge firmly, but the writing finishing degree in the homework also needs to be concerned, and the writing finishing degree is particularly important for cultivating the habits and learning attitudes of writers, so that the judgment of the finishing degree of the written text of the user is accurate, and corresponding suggestions are provided to further help the user to establish good writing habits.
In view of the above technical problems, embodiments of the present disclosure provide a method for scoring a handwritten text neatness, which is specifically described in detail by one or more embodiments below.
Specifically, the scoring method for the handwritten text regularity may be executed by a terminal or a server. Specifically, the terminal or the server may score the regularity of the handwritten text in the image to be scored through a pre-trained detection model, recognition model, classification model and scoring model. The execution subject of the training method of the detection model, the recognition model, the classification model and the scoring model and the execution subject of the scoring method of the handwritten text regularity can be the same or different.
For example, in an application scenario, as shown in fig. 1, fig. 1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure, and in fig. 1, the server 12 trains a detection model, a recognition model, a classification model, and a scoring model. The terminal 11 obtains the trained detection model, recognition model, classification model and scoring model from the server 12, and the terminal 11 scores the neatness of the handwritten text in the image to be scored through the trained detection model, recognition model, classification model and scoring model. The image to be scored may be obtained by photographing by the terminal 11. Alternatively, the image to be evaluated is acquired by the terminal 11 from another device. Still alternatively, the image to be scored may be an image obtained by processing a preset image by the terminal 11, where the preset image may be obtained by shooting by the terminal 11, or the preset image may be obtained by the terminal 11 from another device. Here, the other devices are not particularly limited.
In another application scenario, the server 12 trains the detection model, recognition model, classification model, and scoring model. Further, the server 12 scores the integrity of the handwritten text in the image to be scored through the trained detection model, recognition model, classification model and scoring model. The manner of obtaining the image to be evaluated by the server 12 may be similar to the manner of obtaining the image to be evaluated by the terminal 11, which is not described herein again.
In yet another application scenario, the terminal 11 trains a detection model, a recognition model, a classification model and a scoring model. Further, the terminal 11 scores the regularity of the written text in the image to be scored through the trained detection model, recognition model, classification model and scoring model.
It can be understood that the model training method and the scoring method for the handwritten text regularity provided by the embodiment of the present disclosure are not limited to the above-mentioned several possible scenarios. Since the trained recognition model can be applied to the scoring method for the handwritten text regularity, the model training method can be introduced before the scoring method for the handwritten text regularity is introduced.
Taking the server 12 as an example to train the detection model, the recognition model, the classification model and the scoring model, a model training method, that is, a training process of the detection model, the recognition model, the classification model and the scoring model, is introduced below. It is understood that the model training method is also applicable to the scenario in which the terminal 11 trains the recognition model.
Fig. 2 is a flowchart of a model training method provided in an embodiment of the present disclosure, where the detection model, the recognition model, the classification model, and the scoring model are obtained by the following training methods, and specifically include the following steps S210 to S220 shown in fig. 2:
s210, constructing a training data set, wherein the training data set comprises a handwriting sample image and annotation information corresponding to the handwriting sample image.
The method provided by the disclosure can be used for scoring the neatness of the handwritten text from the angles of whether the handwritten text is neat and square, whether a single-line text is bent and inclined, whether the written text is scratched or not and the like, and provides an objective standard neatness scoring system. Specifically, a large number of written text images are collected, a user writes a text on a piece of handwriting paper, and photographs the handwriting paper to generate a handwritten text image, where the handwriting paper may be a piece of paper such as a job paper and a question answering paper, and the handwritten text image may be a job image that has been answered, for example. After a large number of handwriting sample images are collected, labeling is carried out on each handwriting sample image to obtain a large number of labeling information corresponding to the handwriting sample images, and then a training data set is formed according to the handwriting sample images and the corresponding labeling information.
Optionally, the training data set comprises a plurality of training data subsets.
Optionally, the constructing the training data in S210 specifically includes the following steps S211 to S215:
s211, collecting a large number of handwriting sample images, wherein the handwriting sample images comprise a plurality of lines of handwriting texts.
S212, grading and labeling the whole handwriting regularity of the handwritten text in the handwriting sample image to generate first labeling information, and forming a first training data subset by the first labeling information.
And S213, marking the positions of four vertexes of the writing paper in the handwriting sample image, generating second marking information, and forming a second training data subset by the second marking information.
S214, labeling each line of handwritten text included in the handwritten sample image by using an angled rectangular frame to generate third labeling information, and forming a third training data subset by the third labeling information.
S215, determining a target handwriting sample image with a writing regularity score larger than a preset threshold value in the first training data subset, cutting the target handwriting sample image according to third marking information of the target handwriting sample image to generate a plurality of cut images, marking handwritten texts in the cut images to generate fourth marking information, and forming a fourth training data subset by the fourth marking information, wherein each cut image comprises a line of handwritten texts.
It is understood that a plurality of handwriting sample images are collected, each handwriting sample image includes at least a portion of handwriting paper, the handwriting paper includes at least one line of handwriting text, each line of handwriting text is composed of at least one character, and the following example is described by taking 100 handwriting sample images as an example. Scoring and marking the whole handwriting regularity of the handwriting text in each of 100 collected handwriting sample images to generate first marking information, namely scoring the regularity of the whole handwriting text in the handwriting sample images, for example, a scoring system of 5 is adopted, the scoring value is a discrete number, such as 1, 2, 3, 4 and 5, a plurality of scoring values can be preset, the whole regularity of the handwriting sample images is scored in a plurality of preset scoring value ranges, if the whole handwriting sample images are relatively regular, smearing, scratching, inclination and other conditions do not occur, the handwriting sample images can be marked as 5, otherwise, the handwriting sample images can be marked as 1, each handwriting sample image has corresponding first marking information, and then a first training data subset is formed according to 100 groups of first marking information. Marking the positions of four vertexes of writing paper in each handwriting sample image in 100 collected handwriting sample images, wherein the position of each vertex can be understood as the coordinate of a pixel point on the image, so as to generate second marking information, each handwriting sample image has corresponding second marking information, the second marking information comprises the position coordinates of the four vertexes, if the handwriting paper in the handwriting sample image occupies the whole image, the four vertexes of the handwriting sample image are the four vertexes of the handwriting paper, and the four vertexes of the handwriting paper can be in the handwriting sample image, in this case, the positions of the four vertexes of the handwriting paper are the positions of the pixel points in the handwriting sample image, and then forming a second training data subset by 100 groups of second marking information. Marking each line of handwritten text in the handwritten sample image by using the rectangular frame with the angle to generate third marking information, namely, performing frame selection on each line of the handwritten sample image by using the rectangular frame with the angle, wherein each handwritten sample image has a group of corresponding third marking information, the third marking information comprises position coordinates of a plurality of rectangular frames with the angle, and then, 100 groups of third marking information form a third training data subset. After the first training data subset and the third training data subset are formed, a target handwritten sample image with a writing regularity score larger than a preset threshold is determined in the first training data subset, for example, the preset threshold may be 4 minutes or 5 minutes, that is, a handwritten sample image with a score larger than 4 minutes in the first training data subset is used as the target handwritten sample image, then the target handwritten sample image is cut according to third labeling information corresponding to the target handwritten sample image to generate a plurality of cut images, that is, the target handwritten sample image is cut according to a frame selection position of each line of handwritten text in the third training data subset, a plurality of cut images including one line of handwritten text are obtained by cutting, the handwritten text in the plurality of cut images is labeled to generate fourth labeling information, that is, each character in the cut image including one line of handwritten text is labeled, the obtained plurality of character sequences are used as fourth labeling information corresponding to the target handwritten text image, and then a fourth training data subset is formed based on 100 sets of fourth labeling information. And finally, taking 100 original images of the handwritten text images as a fifth training data subset, and forming a training data set according to the first training data subset, the second training data subset, the third training data subset, the fourth training data subset and the fifth training data subset.
Illustratively, referring to fig. 3, fig. 3 is a schematic diagram of a writing sample image provided by an embodiment of the present disclosure, in the writing sample image 310, a handwriting paper occupies the entire image, in which case four vertexes of the handwriting paper are four vertexes of the writing sample image 310, and in the writing sample image 320, the handwriting paper occupies a part of the content of the image, in which case the four vertexes of the handwriting paper are as shown in 321 to 324 in the writing sample image 320, where a vertex 323 and a vertex 324 are intersections of the handwriting paper and an edge of the image.
S220, training the detection model, the recognition model, the classification model and the grading model by using the training data set.
Understandably, on the basis of the above S210, after the training data set is constructed, the pre-constructed detection model, the pre-constructed recognition model, the pre-constructed classification model and the pre-constructed scoring model are respectively trained by using the training data set.
Illustratively, referring to fig. 4, fig. 4 is a network structure diagram of a model provided by an embodiment of the present disclosure. Fig. 4 shows a flow between models in a using process, specifically, fig. 4 includes a detection model 410, a recognition model 420, a classification model 430, and a scoring model 440, in the using process, that is, after model training is completed, inputs of the detection model 410, the recognition model 420, and the classification model 430 are all handwritten images, and an input of the scoring model 440 is an output of the detection model 410 and the recognition model 420. The detection model 410 is constructed based on a central network (centrnet), the detection model 410 comprises a first feature extraction layer 411, a first convolution layer 412 and a second convolution layer 413, the output of the first feature extraction layer 411 is used as the input of the first convolution layer 412 and the second convolution layer 413, the first convolution layer 412 and the second convolution layer 413 are connected in parallel, and the output of the first convolution layer 412 and the second convolution layer 413 is the output of the detection model 410; the first feature extraction layer 411 uses a residual network as a backbone network, the residual network may be Resnet18, the residual network includes 4 convolution blocks, the convolution blocks are marked as block blocks, each convolution block includes a plurality of convolution layers, feature information of a writing sample image is obtained by adjusting sliding step lengths of convolution operations in different convolution blocks, the feature information is a group of multi-channel feature maps, and the feature maps may be marked as a first feature map; the first convolution layer 412 includes a plurality of convolution layers and a plurality of deconvolution layers, specifically, the first convolution layer 412 includes 2 convolution layers and 3 deconvolution layers connected in sequence, the first convolution layer 412 takes the output of the first feature extraction layer 411 as input and outputs a set of feature maps of four channels, the feature map of each channel is a vertex score map, the vertex score map reflects the position of the vertex of the handwriting paper in the handwriting sample image, and the feature maps of the four channels represent the positions of the upper, lower, left and right four vertices of the handwriting paper; the second convolutional layer 413 comprises a plurality of convolutional layers and a plurality of deconvolution layers, specifically, the second convolutional layer 413 comprises 6 convolutional layers and 3 deconvolution layers which are connected in sequence, the output of each deconvolution layer is used as the input of the next deconvolution layer on one hand and is used as one of the outputs of the detection model 410 through one convolutional layer on the other hand, the second convolutional layer 413 comprises 3 deconvolution layers, 3 sets of feature maps are output by the 3 deconvolution layers, the 3 sets of feature maps can be regarded as the outputs of 3 branches, the feature map output by the second convolutional layer 413 is referred to as a third feature map, the input of the first deconvolution layer in the 3 deconvolution layers is the output of the last convolutional layer in the 6 convolutional layers, the output of the first deconvolution layer outputs a set of feature maps of one channel (the output of the first branch) after passing through one convolutional layer, the output of the first branch is a central point score chart of the single-line handwritten text, the input of a second deconvolution layer in the 3 deconvolution layers is the output of the first deconvolution layer, the output of the second deconvolution layer passes through one convolution layer and then outputs a group of feature maps of two channels (the output of the second branch), the output of the second branch represents the height and the width of the single-line handwritten text, the input of a third deconvolution layer in the 3 deconvolution layers is the output of the second deconvolution layer, the output of the third deconvolution layer passes through one convolution layer and then outputs a group of feature maps of one channel (the output of the third branch), and the output of the third branch represents the inclination angle of the single-line handwritten text; it is understood that the first convolution layer 412 outputs the positions of the four vertices of the handwriting paper corresponding to the second training data subset, and the second convolution layer 413 outputs the position of the rectangular frame with the tilt angle of each single-line handwritten text in the handwriting sample image corresponding to the third training data subset. The recognition model 420 is constructed based on a Convolutional Recurrent Neural Network (CRNN), the recognition model 420 includes a second feature extraction layer 421 and a two-layer bidirectional Long-Short Term Memory network (LSTM) 422, the input of the recognition module 420 is a plurality of images including a single-line handwritten text obtained by cutting the handwritten sample image, the recognition model 420 is used for recognizing texts in the images including the single-line handwritten text, the recognition result of each image including the single-line handwritten text and the recognition confidence of the image are output, corresponding to the fourth training data subset, the recognition confidence is used for calculating a score, and the recognition result refers to the recognition result of each character in the image. The classification model 430 is constructed based on a residual error network, and the input of the classification model 430 is a handwritten sample image and the output is an overall finishing score of the handwritten sample image, which corresponds to the first training data subset. The scoring model 440 is constructed based on the encoder portion of a attention machine (fransformer) model, unlike a Transformer encoder, where the input to the scoring model 440 is not a single character but all characters of a complete one-line handwritten text, i.e. the input to the encoder in the scoring model 440 is the recognition result output by the recognition model 430 during use, the input to the scoring model during training is a fourth subset of training data, and the input to the position encoder in the scoring model 440 is, in use, the position of the one-line handwritten text output by the detection model 410, and the input to the third subset of training data during training is an output obtained from the position of the one-line handwritten text and the recognition result of the current line handwritten text, and then the output outputs the score of the handwritten sample image through two fully-connected layers, where the number of nodes of the second fully-connected layer of the two fully-connected layers is 5, and the overall degree of completion of the handwritten sample image is scored in 5 in correspondence to the first set of training data.
It can be understood that the convolutional layers and the deconvolution layers involved in the multiple models constructed above are different in network parameters, and that the convolutional layers and the deconvolution layers are different in different models.
The invention provides a model training method, which comprises the steps of carrying out labeling on a large number of acquired writing sample images in different directions to generate a plurality of training data subsets, and training a detection model, a classification model, an identification model and a grading model which are constructed in advance through the plurality of training data subsets. According to the method, the marking information of the writing sample image at different angles is obtained, and then a plurality of different models are obtained through different marking information training, so that the fairness and the objectivity in the process of scoring the writing regularity can be furthest ensured, the scoring standard of the writing regularity is shown, and the objective judgment of the writing regularity can be quickly and accurately realized through combining the respective functions of the scoring model, the classification model, the detection model and the recognition model and the added geometric constraint in the follow-up process.
On the basis of the foregoing embodiment, fig. 5 is a flowchart of a model training method provided in the embodiment of the present disclosure, and optionally, the training data set is used to train the detection model, the recognition model, the classification model, and the scoring model, which specifically includes the following steps S510 to S540 shown in fig. 5:
s510, taking the handwriting sample image as the input of the detection model, and training the detection model based on the second training data subset, the third training data subset and the output of the detection model.
Optionally, the detection model includes a first feature extraction layer, a first convolution layer, and a second convolution layer.
Optionally, the step S510 of taking the handwriting sample image as an input of the detection model, and training the detection model based on the second training data subset, the third training data subset, and an output of the detection model specifically includes the following steps S511 to S514:
and S511, extracting the features of the handwriting sample image by using the first feature extraction layer to obtain a group of multi-channel first feature mapping.
S512, the first feature mapping is convolved by the first convolution layer to obtain a set of second feature mapping of four channels, wherein the second feature mapping of each channel represents a vertex score map of one of four vertices of writing paper in the handwriting sample image, and the vertex score map is used for determining the positions of the vertices.
S513, performing convolution on the first feature mapping by using the second convolution layer to obtain multiple groups of third feature mappings, wherein multiple deconvolution layers included in the second convolution layer are sequentially connected, each deconvolution layer outputs one group of third feature mapping, a first group of third feature mapping in the multiple groups of third feature mappings represents a central point score map of each line of handwritten text, a second group of third feature mapping in the multiple groups of third feature mappings represents the width and height of each line of handwritten text, and a third group of third feature mapping in the multiple groups of third feature mappings represents an inclination angle of each line of handwritten text.
S514, training the detection model based on the second training data subset, the third training data subset, the second feature map and the third feature map.
Understandably, after a training data set is constructed, a handwriting sample image is used as input of a pre-constructed detection model, a first feature extraction layer in the detection model extracts feature information of the handwriting sample image to obtain a first feature mapping, the first feature mapping is respectively input into a first convolution layer and a second convolution layer, the second convolution layer detects the position of each line of handwriting text in the handwriting sample image according to the first feature mapping and outputs the position of a predicted handwriting text (a plurality of groups of third feature mappings), the first convolution layer detects the positions of four vertexes of the handwriting paper in the image according to the first feature mapping and outputs the positions of the four vertexes of the predicted handwriting paper (a second feature mapping), and then a FocalLoss loss function is used for calculating a first loss value according to the accurate positions of the four vertexes of the handwriting paper marked in advance in a second training data subset and the positions of the four vertexes of the predicted handwriting paper output by the first convolution layer; the rectangular frame with the angle in the third labeling information in the third training data subset comprises the center point position of each line of written text, the width and the height of each line of written text and the inclination angle of each line of written text, the output of the second convolutional layer in the detection model comprises three branches, each branch outputs a group of feature maps, and specifically, a second loss value is calculated according to the center point position of each line of written text in the third training data subset and the predicted first group of third feature maps by using a focal loss function (focallloss); calculating a third Loss value according to the width and height of each line of written text in the third training data subset and the predicted second group of third feature maps by using a regression Loss function (L1 Loss); calculating a fourth loss value according to the inclination angle of each line of the written text in the third training data subset and a third group of predicted third feature mapping by using a regression loss function; and updating the network parameters of the detection model according to the sum of the first loss value, the second loss value, the third loss value and the fourth loss value, and finishing the training of the detection model.
S520, taking the handwriting sample image as the input of the classification model, and training the classification model based on the first training data subset and the output of the classification model.
Understandably, after a training data set is obtained, inputting a handwriting sample image in the training data set into a pre-constructed classification model, wherein the class output in the classification model is the score of the integral handwritten text finishing degree of the handwriting sample image, for 5-division, the class output by the classification model is 5 integers from 1 to 5, for example, the output 1 represents that the integral handwritten text finishing degree score of the handwriting sample image is 1, then, a multi-classification cross entropy loss function is adopted, a loss value is calculated according to the class output by the classification model and the score marked by the handwriting sample image in a first training data subset, and the network parameter of the classification model is updated according to the loss value, so that the training of the classification model is completed.
S530, taking a plurality of cutting images corresponding to the handwriting sample image as input of the recognition model, and training the recognition model based on the fourth training data subset and output of the recognition model.
Understandably, after the training data set is obtained, cutting the handwriting sample image according to a rectangular frame for labeling each line of handwriting text in the writing sample image in the third training data subset to obtain a plurality of cut images, wherein each cut image comprises one line of handwriting text, and each handwriting sample image can be cut according to the line number of the included handwriting text; after obtaining a plurality of cutting images corresponding to the handwriting sample image, using the plurality of cutting images as the input of a recognition model, recognizing the handwriting text in the cutting images by the recognition model, and generating recognition results, wherein each cutting image corresponds to one recognition result; and then, calculating a loss value according to the character sequence marked in the fourth training data set and the recognition result of the cut image output by the recognition model by adopting a CTC loss function, and updating the network parameters of the recognition model according to the loss value to obtain the trained recognition model.
And S540, taking the fourth training data subset and the third training data subset as the input of the scoring model, and training the scoring model based on the first training data subset and the output of the scoring model.
Understandably, a fourth training data subset and a third training data subset in the training data set are used as the input of a pre-constructed scoring model, the scoring model can be understood as an order detail scoring model, the input of the scoring model is an accurate text recognition result marked in the fourth training data subset and the position of a written text in a writing sample image marked in the third training data subset, and the integral prediction order score for the writing sample image is output; and then, calculating a loss value based on the leveling degree score of the writing sample image marked in the first training data subset and the predicted leveling degree score output by the scoring model by adopting a multi-classification loss function, and updating the network parameters of the scoring model according to the loss value to obtain the trained scoring model.
The embodiment of the disclosure provides a model training method, wherein different training data subsets are constructed, a plurality of training data subsets are utilized to train models constructed based on different angles independently or in combination, and a mode of realizing written text finishing degree grading based on objective standards is provided.
On the basis of the foregoing embodiment, fig. 6 is a flowchart of a scoring method for the completeness of a handwritten text according to an embodiment of the present disclosure, which specifically includes the following steps S610 to S660 shown in fig. 6:
s610, obtaining an image to be scored, wherein the image to be scored comprises handwriting paper written with at least one line of handwriting text.
Understandably, after the model training is completed, the images to be equally divided are obtained, and specifically, a plurality of images to be equally divided are obtained to sequentially carry out the finishing degree grading. The image to be divided equally comprises at least partial handwriting paper, the partial handwriting paper comprises at least one line of handwriting text, and each line of handwriting text comprises at least one handwriting character.
S620, detecting the image to be scored based on a pre-trained detection model and generating a detection result, wherein the detection result comprises the vertex position of the handwriting paper and the position of each line of handwriting text in the at least one line of handwriting text.
Understandably, on the basis of the S610, feature information of an image to be evaluated is extracted based on a first feature extraction layer in a pre-trained detection model to obtain a group of multi-channel feature maps, and then a first convolution layer and a second convolution layer in the detection model are detected simultaneously aiming at the multi-channel feature maps to generate a detection result, wherein the first convolution layer outputs positions of four vertexes of handwritten paper in the image to be evaluated, the second convolution layer outputs positions of each line of handwritten text included in the image to be evaluated, and the positions of each line of handwritten text can be understood as positions of rectangular frames for framing the handwritten text by the rectangular frames with the inclination angles.
S630, determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text in the at least one line of handwritten text.
Optionally, the step S630 specifically includes the following steps S631 to S633:
s631, carrying out affine transformation according to the vertex position of the handwriting paper and a preset vertex position to obtain a homography matrix.
S632, transforming the position of each line of the handwritten text based on the homography matrix, and determining the inclination value of each line of the handwritten text according to the transformed position of each line of the handwritten text.
And S633, determining a first score of the handwritten text in the image to be scored according to the inclination value of each line of the handwritten text.
It can be understood that, on the basis of the S620, after the detection model outputs the four vertex positions of the handwriting paper and the position of each line of the handwritten text, affine transformation is performed according to the four vertex positions of the handwriting paper and the preset vertex positions to obtain a homography matrix, the preset vertex positions can be four preset marking points, a positive rectangle can be formed between the four marking points relative to the image to be evaluated to judge whether the handwriting paper is inclined in the image to be evaluated, and the handwriting paper may be inclined when the handwriting paper is photographed to generate the image to be evaluated. And then, transforming the position of each line of the handwritten text based on the homography matrix, namely judging whether the text line has inclination, and determining the inclination value of each line of the handwritten text aiming at the position of each line of the handwritten text after transformation, wherein the inclination value is used for indicating the inclination degree of the written text on the writing paper, namely judging whether the word written by the user is inclined. The first score of the handwritten text in the image to be scored is determined according to the tilt value of each line of the handwritten text, the first score is used for scoring the tilt degree of the handwritten text from a geometric angle, for example, the image to be scored comprises 10 lines of the handwritten text, each line of the handwritten text has a tilt value, a tilt threshold value is preset, if 4 tilt values in the 10 tilt values are greater than or equal to the preset tilt threshold value, the first score of the image to be scored is 3, that is, 6 tilt values are less than the preset tilt threshold value, the proportion is 0.6, and in the case of 5 scores, the product of 0.6 and 5 is 3, that is, the first score is 3.
And S640, recognizing each line of handwritten text in the image to be scored through a pre-trained recognition model based on the position of each line of handwritten text in the at least one line of handwritten text, and generating a second score according to the recognition result of each line of handwritten text.
Optionally, the step S640 specifically includes the following steps S641 to S643:
s641, cutting the image to be scored according to the position of each line of the handwritten text in the at least one line of the handwritten text to obtain at least one handwritten text image, wherein each handwritten text image comprises a line of the handwritten text.
S642, identifying the handwritten text in each handwritten text image based on a pre-trained identification model to obtain the corresponding confidence of each handwritten text image.
And S643, generating a second score according to the corresponding confidence of each handwritten text image.
Understandably, on the basis of S630, the image to be scored is cut according to the position of each line of handwritten text in at least one line of handwritten text, and a corresponding handwritten text image may be obtained according to the number of lines of included handwritten text, where each handwritten text image includes one line of handwritten text. Then, the handwritten text in each handwritten text image is recognized based on a pre-trained recognition model to obtain a recognition result and a confidence coefficient corresponding to the handwritten text image, the recognition result is a character recognition result in the handwritten text image, the confidence coefficient range of the handwritten text image is 0-1, then, a second score is generated according to the confidence coefficient corresponding to each handwritten text image, on the basis of the example, 10 handwritten text images are obtained by cutting the image to be scored, 10 confidence coefficients are obtained through the recognition model, 10 confidence coefficients are added to calculate the average and then multiplied by 5 to obtain the second score, and 5 means 5 scores. It will be appreciated that the calculation of each score is based on the same score.
And S650, generating a third score according to the detection result and the recognition result of each line of the handwritten text based on a pre-trained scoring model.
It can be understood that, on the basis of the above S640, based on the pre-trained scoring model, a third score is generated according to the position of each line of the handwritten text in the detection result output by the detection model and the recognition result of each line of the handwritten text output by the recognition model, where the third score is used for scoring the overall integrity of the image to be scored in terms of writing details in the image to be scored, and the details may be the accuracy of writing, that is, whether characters with writing errors exist or not is determined, and the third score is also 5.
And S660, obtaining the finishing degree score of the handwritten text according to the first score, the second score and the third score.
Optionally, the images to be scored are classified based on a classification model trained in advance, and a fourth score is generated according to the classification result.
It can be understood that before the image to be scored is determined to be scored according to the degree of completeness, the image to be scored may be further input into a classification model trained in advance, the degree of completeness of the image to be scored is scored from an overall perspective, and a fourth score is output, and the fourth score is also 5-degree.
Optionally, the step S660 specifically includes: and carrying out weighted average according to the first score, the second score, the third score and the fourth score to obtain a finishing degree score of the handwritten text.
Understandably, on the basis of the above S650, the first score, the second score, the third score and the fourth score are weighted and averaged to obtain the finishing score of the handwritten text, and the weight can be determined by self according to the requirement. It is understood that the first score to the fourth score for performing the weighted average calculation of the degree of completeness may be determined by themselves, that is, the degree of completeness of the image to be scored may be selected by itself from which of the 4 scores is used for calculating the degree of completeness of the image to be scored, for example, the degree of completeness of the image to be scored may be calculated based on the first score, the second score and the third score, and the degree of completeness of the image to be scored may also be calculated by selecting the first score, the second score and the fourth score, which is not limited herein.
It can be understood that the first score is calculated from whether the single-line handwritten text is inclined in the image, and a first threshold value may be preset, so as to issue an inclination prompt to the user according to the inclination condition of the single-line handwritten text, for example, 10 lines of handwritten text exist in the image to be scored, and the inclination value of the 2 nd line handwritten text in the 10 lines of handwritten text is greater than the first threshold value, which indicates that the 2 nd line handwritten text is inclined during writing, so that information that the 2 nd line handwritten text is inclined during writing may be issued to the user, so as to prompt the user to avoid the inclination condition in subsequent writing. The second is scored and is scored in the handwriting aspect to the handwritten text, if the writing of the handwritten text is neat and well-done, the confidence coefficient of the output of the recognition model is higher, otherwise, the confidence coefficient of the output of the comparatively illegible recognition model of the handwriting of the handwritten text is lower, and the user can be reminded whether to be illegible or not according to the second scoring and the preset second threshold value. The third scoring is used for scoring the handwritten text on the basis of details, further scoring whether the written characters have scratches or not and whether the written characters are clear or not, and sending out a prompt to the user on the basis of the third scoring and a preset third threshold value on the basis of the details of writing. The fourth scoring is based on the integral image finishing degree, the integral writing condition can be reflected, for example, whether the whole written text is clean and tidy can be seen on the whole, and then whether the whole written text is clean and tidy can be reminded to the user according to a preset fourth threshold and the fourth scoring.
The embodiment of the disclosure provides a method for scoring the regularity of a handwritten text, which is used for scoring the regularity of the handwritten text in an image based on the aspects of whether the handwritten text is neat and square, whether each line of the handwritten text is bent and inclined, whether the handwritten text is scratched or not and the like.
Fig. 7 is a schematic structural diagram of a handwritten text finishing degree scoring device according to an embodiment of the present disclosure. The scoring device for the handwritten text completeness provided in the embodiment of the present disclosure may execute the processing procedure provided in the scoring method for the handwritten text completeness, as shown in fig. 7, the scoring device 700 for the handwritten text completeness includes:
an obtaining unit 710, configured to obtain an image to be scored, where the image to be scored includes a piece of handwriting paper on which at least one line of handwriting text is written;
the detection unit 720 is configured to detect the image to be scored based on a pre-trained detection model and generate a detection result, where the detection result includes a vertex position of the handwriting paper and a position of each line of handwritten text in the at least one line of handwritten text;
the first scoring unit 730 is configured to determine a first score of the handwritten text in the image to be scored according to the vertex position of the handwriting paper and the position of each line of the handwritten text in the at least one line of the handwritten text;
the second scoring unit 740 is configured to recognize each line of handwritten text in the image to be scored through a pre-trained recognition model based on a position of each line of handwritten text in the at least one line of handwritten text, and generate a second score according to a recognition result of each line of handwritten text;
a third scoring unit 750, configured to generate a third score according to the detection result and the recognition result of each line of handwritten text based on a pre-trained scoring model;
a fourth scoring unit 760, configured to obtain a finishing score of the handwritten text according to the first score, the second score, and the third score.
Optionally, the first scoring unit 730 determines a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text in the at least one line of handwritten text, and is specifically configured to:
performing affine transformation according to the vertex position of the handwriting paper and a preset vertex position to obtain a homography matrix;
transforming the position of each line of handwritten text based on the homography matrix, and determining the inclination value of each line of handwritten text aiming at the transformed position of each line of handwritten text;
and determining a first score of the handwritten text in the image to be scored according to the inclination value of each line of the handwritten text.
Optionally, in the second scoring unit 740, based on a position of each line of the handwritten text in the at least one line of the handwritten text, each line of the handwritten text in the image to be scored is recognized through a recognition model trained in advance, and a second score is generated according to a recognition result of each line of the handwritten text, which is specifically used for:
cutting the image to be scored according to the position of each line of handwritten text in the at least one line of handwritten text to obtain at least one handwritten text image, wherein each handwritten text image comprises a line of handwritten text;
identifying the handwritten text in each handwritten text image based on a pre-trained identification model to obtain a corresponding confidence coefficient of each handwritten text image;
and generating a second score according to the corresponding confidence coefficient of each handwritten text image.
Optionally, the apparatus 700 further includes a fifth scoring unit, and the fifth scoring unit is specifically configured to:
and classifying the images to be scored based on a pre-trained classification model, and generating a fourth score according to a classification result.
Optionally, the fourth scoring unit 760 obtains the finishing score of the handwritten text according to the first score, the second score, and the third score, and is specifically configured to:
and carrying out weighted average according to the first score, the second score, the third score and the fourth score to obtain a finishing degree score of the handwritten text.
Optionally, the apparatus 700 further includes a training unit, where the training unit is configured to obtain the detection model, the recognition model, the classification model, and the scoring model by the following training methods:
constructing a training data set, wherein the training data set comprises a handwriting sample image and annotation information corresponding to the handwriting sample image;
training the detection model, the recognition model, the classification model, and the scoring model using the training dataset.
Optionally, the training data set in the training unit includes a plurality of training data subsets.
Optionally, the constructing a training data set in the training unit is specifically configured to:
collecting a plurality of handwritten sample images, the handwritten sample images comprising a plurality of lines of handwritten text;
grading and marking the whole handwriting regularity of the handwritten text in the handwriting sample image to generate first marking information, and forming a first training data subset by the first marking information;
marking the positions of four vertexes of writing paper in the handwriting sample image to generate second marking information, and forming a second training data subset by the second marking information;
labeling each line of handwritten text included in the handwritten sample image by using an angled rectangular frame to generate third labeling information, and forming a third training data subset by the third labeling information;
determining a target handwriting sample image with a writing regularity score larger than a preset threshold value in the first training data subset, cutting the target handwriting sample image according to third marking information of the target handwriting sample image to generate a plurality of cut images, marking handwritten texts in the cut images to generate fourth marking information, and forming a fourth training data subset by the fourth marking information, wherein each cut image comprises a line of handwritten texts.
Optionally, in the training unit, the detection model, the recognition model, the classification model, and the scoring model are trained by using the training data set, and specifically configured to:
taking the handwriting sample image as an input of the detection model, and training the detection model based on the second training data subset, the third training data subset and an output of the detection model;
taking the handwritten sample image as an input of the classification model, and training the classification model based on the first training data subset and an output of the classification model;
taking a plurality of cutting images corresponding to the handwriting sample images as the input of the recognition model, and training the recognition model based on the fourth training data subset and the output of the recognition model;
using the fourth training data subset and the third training data subset as inputs to the scoring model, and training the scoring model based on the first training data subset and the output of the scoring model.
Optionally, the detection model in the training unit includes a first feature extraction layer, a first convolution layer, and a second convolution layer.
Optionally, in the training unit, the handwritten sample image is used as an input of the detection model, and the detection model is trained based on the second training data subset, the third training data subset, and an output of the detection model, and specifically is used for:
extracting the features of the handwritten sample image by using the first feature extraction layer to obtain a group of multi-channel first feature mapping;
convolving the first feature mapping by using the first convolution layer to obtain a set of second feature mapping of four channels, wherein the second feature mapping of each channel represents a vertex score map of one of four vertices of writing paper in the handwriting sample image, and the vertex score map is used for determining the positions of the vertices;
performing convolution on the first feature mapping by using the second convolution layer to obtain multiple groups of third feature mappings, wherein multiple deconvolution layers included in the second convolution layer are connected in sequence, each deconvolution layer outputs one group of third feature mapping, a first group of third feature mapping in the multiple groups of third feature mappings represents a central point score map of each line of handwritten text, a second group of third feature mapping in the multiple groups of third feature mappings represents the width and height of each line of handwritten text, and a third group of third feature mapping in the multiple groups of third feature mappings represents the inclination angle of each line of handwritten text;
training the detection model based on the second subset of training data, the third subset of training data, the second feature map, and the third feature map.
The device provided by the embodiment has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is operative to cause the electronic device to perform a method according to embodiments of the disclosure.
The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
Referring to fig. 8, a block diagram of a structure of an electronic device 800, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to the electronic device 800, and the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device. Output unit 807 can be any type of device capable of presenting information and can include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 808 may include, but is not limited to, a magnetic disk or an optical disk. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 801 executes the respective methods and processes described above. For example, in some embodiments, the scoring method for the handwritten text regularity may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. In some embodiments, the computing unit 801 may be configured to perform the scoring method of handwritten text regularity in any other suitable manner (e.g., by means of firmware).
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A scoring method for the regularity of handwritten texts is characterized by comprising the following steps:
acquiring an image to be evaluated, wherein the image to be evaluated comprises handwriting paper written with at least one line of handwriting text;
detecting the image to be scored based on a pre-trained detection model and generating a detection result, wherein the detection result comprises the vertex position of the handwriting paper and the position of each line of handwriting text in the at least one line of handwriting text;
determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text in the at least one line of handwritten text;
identifying each line of handwritten text in the image to be scored through a pre-trained identification model based on the position of each line of handwritten text in the at least one line of handwritten text, and generating a second score according to the identification result of each line of handwritten text;
generating a third score according to the detection result and the recognition result of each line of handwritten text based on a pre-trained scoring model;
and obtaining the finishing degree score of the handwritten text according to the first score, the second score and the third score.
2. The method of claim 1, wherein determining the first score of the handwritten text in the image to be scored according to the vertex position of the handwriting paper and the position of each line of the handwritten text in the at least one line of handwritten text comprises:
carrying out affine transformation according to the vertex position of the handwriting paper and a preset vertex position to obtain a homography matrix;
transforming the position of each line of handwritten text based on the homography matrix, and determining the inclination value of each line of handwritten text aiming at the transformed position of each line of handwritten text;
and determining a first score of the handwritten text in the image to be scored according to the inclination value of each line of the handwritten text.
3. The method according to claim 1, wherein the recognizing each line of the handwritten text in the image to be scored through a pre-trained recognition model based on the position of each line of the handwritten text in the at least one line of the handwritten text, and generating a second score according to the recognition result of each line of the handwritten text comprises:
cutting the image to be scored according to the position of each line of handwritten text in the at least one line of handwritten text to obtain at least one handwritten text image, wherein each handwritten text image comprises a line of handwritten text;
identifying the handwritten text in each handwritten text image based on a pre-trained identification model to obtain a corresponding confidence coefficient of each handwritten text image;
and generating a second score according to the corresponding confidence of each handwritten text image.
4. The method of claim 1, further comprising:
classifying the images to be scored based on a pre-trained classification model, and generating a fourth score according to a classification result;
the obtaining of the finishing degree score of the handwritten text according to the first score, the second score and the third score includes:
and carrying out weighted average according to the first score, the second score, the third score and the fourth score to obtain a finishing degree score of the handwritten text.
5. The method according to claim 4, wherein the detection model, the recognition model, the classification model and the scoring model are obtained by the following training methods:
constructing a training data set, wherein the training data set comprises a handwriting sample image and annotation information corresponding to the handwriting sample image;
training the detection model, the recognition model, the classification model, and the scoring model using the training dataset.
6. The method of claim 5, wherein the training data set comprises a plurality of training data subsets; the constructing of the training data set comprises:
collecting a plurality of handwritten sample images, the handwritten sample images comprising a plurality of lines of handwritten text;
grading and marking the whole handwriting regularity of the handwritten text in the handwriting sample image to generate first marking information, and forming a first training data subset by the first marking information;
marking the positions of four vertexes of writing paper in the handwriting sample image to generate second marking information, and forming a second training data subset by the second marking information;
labeling each line of handwritten text included in the handwritten sample image by using an angled rectangular frame to generate third labeling information, and forming a third training data subset by the third labeling information;
determining a target handwriting sample image with a writing regularity score larger than a preset threshold value in the first training data subset, cutting the target handwriting sample image according to third marking information of the target handwriting sample image to generate a plurality of cut images, marking handwritten texts in the cut images to generate fourth marking information, and forming a fourth training data subset by the fourth marking information, wherein each cut image comprises a line of handwritten texts.
7. The method of claim 6, wherein training the detection model, the recognition model, the classification model, and the scoring model with the training dataset comprises:
taking the handwriting sample image as an input of the detection model, and training the detection model based on the second training data subset, the third training data subset and an output of the detection model;
taking the handwritten sample image as an input of the classification model, and training the classification model based on the first training data subset and an output of the classification model;
taking a plurality of cutting images corresponding to the handwriting sample images as the input of the recognition model, and training the recognition model based on the fourth training data subset and the output of the recognition model;
using the fourth subset of training data and the third subset of training data as inputs to the scoring model, and training the scoring model based on the first subset of training data and the output of the scoring model.
8. The method of claim 7, wherein the inspection model comprises a first feature extraction layer, a first convolutional layer, and a second convolutional layer; the training the detection model based on the second training data subset, the third training data subset, and the output of the detection model with the handwriting sample image as the input of the detection model includes:
extracting the features of the handwritten sample image by using the first feature extraction layer to obtain a group of multi-channel first feature mapping;
convolving the first feature mapping by using the first convolution layer to obtain a set of second feature mapping of four channels, wherein the second feature mapping of each channel represents a vertex score map of one of four vertices of writing paper in the handwriting sample image, and the vertex score map is used for determining the positions of the vertices;
convolving the first feature mapping by using the second convolution layer to obtain multiple groups of third feature mappings, wherein multiple deconvolution layers included in the second convolution layer are connected in sequence, each deconvolution layer outputs one group of third feature mapping, a first group of third feature mapping in the multiple groups of third feature mappings represents a central point score map of each line of handwritten text, a second group of third feature mapping in the multiple groups of third feature mappings represents the width and height of each line of handwritten text, and a third group of third feature mapping in the multiple groups of third feature mappings represents the inclination angle of each line of handwritten text;
training the detection model based on the second subset of training data, the third subset of training data, the second feature map, and the third feature map.
9. A scoring device for the neatness of handwritten text, comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring an image to be scored, and the image to be scored comprises handwriting paper written with at least one line of handwriting text;
the detection unit is used for detecting the image to be evaluated based on a pre-trained detection model and generating a detection result, and the detection result comprises the vertex position of the handwriting paper and the position of each line of handwriting text in the at least one line of handwriting text;
the first scoring unit is used for determining a first score of the handwritten text in the image to be scored according to the vertex position of the handwritten paper and the position of each line of handwritten text in the at least one line of handwritten text;
the second scoring unit is used for identifying each line of handwritten text in the image to be scored through a pre-trained identification model based on the position of each line of handwritten text in the at least one line of handwritten text and generating a second score according to the identification result of each line of handwritten text;
the third scoring unit is used for generating a third score according to the detection result and the recognition result of each line of the handwritten text based on a pre-trained scoring model;
and the fourth scoring unit is used for obtaining the finishing degree score of the handwritten text according to the first score, the second score and the third score.
10. An electronic device, characterized in that the electronic device comprises:
a processor; and
a memory for storing the program, wherein the program is stored in the memory,
wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method of scoring handwritten text neatness of claims 1 to 8.
11. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method for scoring handwritten text regularity according to claims 1 to 8.
CN202210891472.5A 2022-07-27 2022-07-27 Method, device, equipment and medium for grading handwritten text neatness Pending CN115273102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210891472.5A CN115273102A (en) 2022-07-27 2022-07-27 Method, device, equipment and medium for grading handwritten text neatness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210891472.5A CN115273102A (en) 2022-07-27 2022-07-27 Method, device, equipment and medium for grading handwritten text neatness

Publications (1)

Publication Number Publication Date
CN115273102A true CN115273102A (en) 2022-11-01

Family

ID=83771056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210891472.5A Pending CN115273102A (en) 2022-07-27 2022-07-27 Method, device, equipment and medium for grading handwritten text neatness

Country Status (1)

Country Link
CN (1) CN115273102A (en)

Similar Documents

Publication Publication Date Title
CN112348815B (en) Image processing method, image processing apparatus, and non-transitory storage medium
US11790641B2 (en) Answer evaluation method, answer evaluation system, electronic device, and medium
CN110956138B (en) Auxiliary learning method based on home education equipment and home education equipment
US20190279368A1 (en) Method and Apparatus for Multi-Model Primitive Fitting based on Deep Geometric Boundary and Instance Aware Segmentation
CN114155546B (en) Image correction method and device, electronic equipment and storage medium
CN111738249B (en) Image detection method, image detection device, electronic equipment and storage medium
CN110969045A (en) Behavior detection method and device, electronic equipment and storage medium
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
CN113688273B (en) Graphic question answering and judging method and device
CN115937003A (en) Image processing method, image processing device, terminal equipment and readable storage medium
CN115082935A (en) Method, apparatus and storage medium for correcting document image
CN113516697B (en) Image registration method, device, electronic equipment and computer readable storage medium
CN112907773B (en) Intelligent attendance checking method and system based on action detection and face recognition
CN112991410A (en) Text image registration method, electronic equipment and storage medium thereof
JP7293658B2 (en) Information processing device, information processing method and program
CN113850239B (en) Multi-document detection method and device, electronic equipment and storage medium
CN112396057A (en) Character recognition method and device and electronic equipment
CN113850805B (en) Multi-document detection method and device, electronic equipment and storage medium
CN113255629B (en) Document processing method and device, electronic equipment and computer readable storage medium
CN115273102A (en) Method, device, equipment and medium for grading handwritten text neatness
CN115294573A (en) Job correction method, device, equipment and medium
CN111652204B (en) Method, device, electronic equipment and storage medium for selecting target text region
CN111563407A (en) Model training method, and image direction correcting method and device
CN116168398B (en) Examination paper approval method, device and equipment based on image identification
US11657649B2 (en) Classification of subjects within a digital image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination