CN112699871A - Method, system, device and computer readable storage medium for field content identification - Google Patents

Method, system, device and computer readable storage medium for field content identification Download PDF

Info

Publication number
CN112699871A
CN112699871A CN202011555047.6A CN202011555047A CN112699871A CN 112699871 A CN112699871 A CN 112699871A CN 202011555047 A CN202011555047 A CN 202011555047A CN 112699871 A CN112699871 A CN 112699871A
Authority
CN
China
Prior art keywords
content
field
target
field content
digital
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011555047.6A
Other languages
Chinese (zh)
Other versions
CN112699871B (en
Inventor
王燕玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202011555047.6A priority Critical patent/CN112699871B/en
Publication of CN112699871A publication Critical patent/CN112699871A/en
Application granted granted Critical
Publication of CN112699871B publication Critical patent/CN112699871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a field content identification method, which comprises the steps of obtaining a target image and a service scene corresponding to the target image; calling a target identification model based on the service scene to identify a target image to obtain a plurality of first field contents; according to the field content types of the first field contents, assisting in correcting the first field contents to obtain target field contents; when the field content type is the digital content type, extracting first digital field content from the first field content; acquiring a first image associated with a target image; acquiring second digital field content associated with the first digital field content from the first image; and through the incidence relation between the first digital field content and the second digital field content, the first digital field content is corrected in an auxiliary mode, and the target digital field content after auxiliary correction is generated. In the invention, the optimal recognition model (namely the target recognition model) is called according to the service, and the auxiliary correction is carried out on the recognized result, thereby effectively improving the recognition rate of recognizing characters and numbers from the image.

Description

Method, system, device and computer readable storage medium for field content identification
Technical Field
The embodiment of the invention relates to the technical field of image identification, in particular to a field content identification method, a field content identification system, computer equipment and a computer readable storage medium.
Background
OCR (Optical Character Recognition) refers to a process in which an electronic device examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text using a Character Recognition method. The OCR identification is to take a picture of the identity card through a mobile phone or a terminal device with a camera, perform OCR character identification on the identity card picture by using an OCR character identification technology, and extract identity card information.
At present, character recognition and error correction are not intelligent and efficient enough, usually an OCR model is combined with a manual auditing process, recognition results are manually input and audited, error correction accuracy is low, and a large amount of human resources are consumed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a field content identification method, system, computer device, and computer readable storage medium, which are used to solve the problems that current character identification and error correction are not intelligent and efficient enough, and the identification and error correction accuracy is low.
The embodiment of the invention solves the technical problems through the following technical scheme:
a field content identification method comprises the following steps:
receiving an identification request sent by a client, wherein the identification request carries a target image and a service scene corresponding to the target image;
calling a target recognition model in a plurality of candidate recognition models based on the service scene;
identifying the target image based on the target identification model to obtain a plurality of first field contents;
according to the field content types of the first field contents, assisting in correcting the first field contents to obtain target field contents; wherein,
when a service scene is a first service scene and the field content type of the first field content is a digital content type, extracting first digital field content from the first field content based on the first service scene;
acquiring a first image associated with the target image;
obtaining second digital field content associated with the first digital field content from the first image;
the first digital field content is corrected in an auxiliary mode through the incidence relation between the first digital field content and the second digital field content, and target digital field content after auxiliary correction is generated; and
and feeding back the content of the target digital field to the client.
Optionally, the step of invoking a target recognition model of the multiple candidate recognition models based on the service scenario further includes:
determining a plurality of field content areas to be identified based on the service scene;
calculating the comprehensive weight of each candidate recognition model according to the preset weight of each field content area to be recognized and each candidate recognition model; and
determining a candidate identification model with the highest comprehensive weight as the target identification model, and calling the target identification model; wherein,
when a plurality of target recognition models are determined according to the service scene, acquiring the calling times of the plurality of target recognition models within preset time;
and comparing the calling times of the target identification models, and determining the target identification model with the minimum calling time as the target identification model matched with the service scene.
Optionally, the step of identifying the target image based on the target identification model to obtain a plurality of first field contents further includes:
dividing the target image to obtain a plurality of image blocks;
determining a plurality of field content areas to be identified on a plurality of image blocks based on the format of the target image;
extracting a plurality of convolution characteristics of a plurality of field content areas to be identified from the target identification model;
inputting the plurality of convolution characteristics into a classifier of the target recognition model for recognition;
and generating a plurality of first field contents corresponding to the plurality of field content areas to be identified.
Optionally, after the step of obtaining the first digital field content from the plurality of first field contents based on the first service scenario, the method further includes:
judging whether a first designated position in the content of the first digital field comprises characters or not;
when the first designated position comprises a character and the character is inconsistent with a preset character, replacing the character with the preset character; the preset character is a preset specific character of the first designated position; and;
and when the first designated position does not comprise any character, filling the preset character into the first designated position.
Optionally, the service scenario is a second service scenario;
the step of assisting in correcting the plurality of first field contents to obtain the target field contents according to the field content types of the plurality of first field contents comprises:
acquiring first text field contents from the plurality of first field contents based on the second service scene;
acquiring a second image associated with the target image;
acquiring second text field content associated with the first text field content from the second image;
extracting key word content from the second word field content according to the incidence relation between the first word field content and the second word field content;
and the first text field content is corrected in an auxiliary manner through the key text content, so that the target text field content after auxiliary correction is obtained.
Optionally, after the step of obtaining the first text field content from the plurality of first field contents based on the second service scenario, the method further includes:
judging whether a second appointed position in the first text field content comprises text content or not;
when the second designated position comprises the text content and the text content is inconsistent with the preset text content, replacing the text content with the preset text content; the preset text content is specific text content associated with the second appointed position;
and when the second appointed position does not comprise the text content, filling the preset text content to the second appointed position.
Optionally, the method comprises: and storing the acquired identification request and the corresponding target field content in the block chain.
In order to achieve the above object, an embodiment of the present invention further provides a field content identification system, including:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an identification request sent by a client, and the identification request carries a target image and a service scene corresponding to the target image;
the calling module is used for calling a target recognition model in a plurality of candidate recognition models based on the service scene;
the identification module is used for identifying the target image based on the target identification model so as to obtain a plurality of first field contents;
the auxiliary correction module is used for performing auxiliary correction on the plurality of first field contents according to the field content types of the plurality of first field contents to obtain target field contents; wherein,
an extracting module, configured to, when a field content type of the first field content is a digital content type, extract a first digital field content from the first field content based on the service scenario;
a first acquisition module for acquiring a first image associated with the target image;
a second obtaining module, configured to obtain, from the first image, a second digital field content associated with the first digital field content;
a generating module, configured to assist in correcting the first digital field content and generate an assisted-corrected target digital field content according to an association relationship between the first digital field content and the second digital field content;
and the feedback module is used for feeding back the content of the target digital field to the client.
In order to achieve the above object, an embodiment of the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the field content identification method as described above when executing the computer program.
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, where the computer program is executable by at least one processor, so as to cause the at least one processor to execute the steps of the field content identification method as described above.
According to the field content identification method, the field content identification system, the computer equipment and the computer readable storage medium, an optimal identification model (namely a target identification model) is called according to the service, an auxiliary correction strategy is determined based on a plurality of first field content types and service scenes, and the identified result is subjected to auxiliary correction through the auxiliary correction strategy, so that characters can be intelligently and efficiently identified and corrected, and the accuracy of identification and correction of field contents in the image is effectively improved.
The invention is described in detail below with reference to the drawings and specific examples, but the invention is not limited thereto.
Drawings
FIG. 1 is a flowchart illustrating steps of a method for identifying paragraph contents according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of invoking a target recognition model in a method for recognizing paragraph contents according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps of obtaining a plurality of first field contents in a method for identifying the field contents according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating steps of verifying a specific character based on a first service scenario in a method for identifying content of a character fragment according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating a step of performing auxiliary correction on the second text field content to obtain an auxiliary corrected target text field content based on a second service scenario in the text field content recognition method according to the embodiment of the present invention;
fig. 6 is a flowchart illustrating steps of verifying specific text contents based on a second service scenario in a text content identification method according to an embodiment of the present invention;
FIG. 7 is a block diagram of a second exemplary embodiment of a system for identifying paragraph contents;
fig. 8 is a schematic hardware structure diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Technical solutions between various embodiments may be combined with each other, but must be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart illustrating steps of a method for recognizing paragraph contents according to an embodiment of the present invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is given by taking a computer device as an execution subject, specifically as follows:
as shown in fig. 1, the field content identification method may include steps S100 to S900, in which:
step S100, receiving an identification request sent by a client, wherein the identification request carries a target image and a service scene corresponding to the target image.
In an exemplary embodiment, different service scenarios correspond to different service scenarios. And according to different service scenes selected and input by the client, the input target images are different. The target image includes but is not limited to various certificates such as an ID card front image, an ID card back image, a client handheld ID card front image, a client handheld ID card back image and the like.
Specifically, the service scenarios that can be selected by the client include, but are not limited to: credit card application service, credit card blocking service, credit card information change service, credit card flow inquiry service, financial product purchase applying service, deposit card application service, deposit card blocking service, deposit card information change service, deposit card flow inquiry service and the like.
And step S200, calling a target recognition model in a plurality of candidate recognition models based on the service scene.
Each candidate recognition model may be a recognition model that is trained and optimized in advance according to a corresponding service scenario. Each candidate Recognition model may be an OCR (Optical Character Recognition) model.
The target recognition model may be invoked from an OCR model intelligent selection center. The OCR model intelligent selection center integrates the technical advantages of all large OCR recognition manufacturers, and comprises a plurality of candidate recognition models.
Further, as shown in fig. 2, the step S200 may include steps S201 to S205, wherein: step S201, determining a plurality of field content areas to be identified based on the service scene; step S202, calculating the comprehensive weight of each candidate identification model according to the content area of each field to be identified and the preset weight of each candidate identification model; step S203, determining a candidate identification model with the highest comprehensive weight as the target identification model, and calling the target identification model; step S204, when a plurality of target recognition models are determined according to the service scene, obtaining the calling times of the plurality of target recognition models in the preset time; and step S205, comparing the calling times of the target identification models, and determining the target identification model with the minimum calling time as the target identification model matched with the service scene.
In an exemplary embodiment, different business scenarios have different emphasis on the content area of the field to be identified in the target image. For example, when the service scene is a credit card application service and the target image is an identification card front image, the fixed point coordinates corresponding to the plurality of standard fields are obtained according to the format of the identification card front image to determine a plurality of standard field areas; and determining the areas of the contents of the fields to be identified according to the standard field areas. Wherein, the names, the sexes, the nationalities, the births, the years, the months, the days, the addresses, the citizen identification numbers and the like are standard fields with fixed-point coordinates; the 'XXX' corresponding to the name, 'woman' corresponding to the gender, 'Han' corresponding to the nationality, '2000' corresponding to the year, '10' corresponding to the month, '10' corresponding to the day, 'XXX' corresponding to the address, '44 XXX' corresponding to the citizen identity number, and 'Futian district Huaqiang north street XXX' corresponding to the Guangdong Shenzhen city.
For example, according to the fixed point coordinate of the standard field as "name", a preset offset coordinate is added to the fixed point coordinate of the standard field to obtain a new coordinate, and according to the new coordinate, the to-be-identified field content area corresponding to the standard field can be obtained.
The content areas of the fields to be recognized in different service scenes and the preset weights of the candidate recognition models are obtained by analyzing the advantages of each large OCR manufacturer in specific scenes by combining big data. Different field content areas to be identified in the same service scene and the preset weight of each candidate identification model can be the same or different.
Specifically, a plurality of OCR recognition vendors are obtained, the advantages of the vendors in a specific scene are analyzed, the preset weights of each to-be-recognized field content area of different service scenes and the model of each OCR recognition vendor are obtained, and the comprehensive weight of each candidate recognition model corresponding to a service scene is calculated according to the preset weights, so that an Application Program Interface (API) of the OCR recognition vendor with the highest comprehensive weight in the scene is preferentially called.
Illustratively, sample identity card photo information and a sample identity card information recognition result are input into a system, then the sample identity card photo information is automatically uploaded by a program, each manufacturer API is called to use an OCR model of each manufacturer to recognize the sample identity card photo information, the output sample recognition result is compared with the manually input sample identity card information recognition result to obtain the recognition passing rate of each OCR model in different field contents, the preset weight of each OCR model corresponding to different field contents is obtained by calculation according to the recognition passing rate of each OCR model in different field contents, and therefore the comprehensive weight obtained by calculation according to the field content area to be recognized required by a specific scene and the preset weight of each OCR model is used for determining the OCR model to be called.
For example, large OCR recognition manufacturers including joint OCR, scientific OCR and large data self-research OCR find that the recognition rate of the large data OCR on the content of each field is high through training of a large amount of sample data. In the process of actually using OCR recognition, large-data OCR is usually taken as a main factor and is preferentially called.
When a plurality of target recognition models corresponding to the acquired service types are acquired, the load of each target recognition model needs to be considered, such as the number of times each target recognition model is called. The more times a certain target recognition model is called within the preset time, the larger the server load bearing the target recognition model is; therefore, the target recognition model with the minimum number of times of being called is determined as the target recognition model matched with the business scene.
And calling the optimal recognition model from the OCR model intelligent selection center according to the service, so that the image recognition of different services can be matched with the optimal recognition model for recognition, and the recognition rate of the field content in the image is effectively and correspondingly improved for different services.
Step S300, identifying the target image based on the target identification model to obtain a plurality of first field contents.
For example, the target image may be identified as a whole to obtain the plurality of first field contents.
In an exemplary embodiment, referring to fig. 3, the step S300 may include: step S301, dividing the target image to obtain a plurality of image blocks; step S302, determining a plurality of field content areas to be identified on a plurality of image blocks based on the format of the target image; step S303, extracting a plurality of convolution characteristics of a plurality of field content areas to be identified from the target identification model; step S304, inputting the plurality of convolution characteristics into a classifier of the target recognition model for recognition; and step S305, generating a plurality of first field contents corresponding to the plurality of field content areas to be identified. The method for identifying the target image by the image sub-blocks can identify a plurality of image blocks simultaneously, effectively shortens time consumption and greatly improves identification efficiency.
And step S400, assisting in correcting the plurality of first field contents to obtain target field contents according to the field content types of the plurality of first field contents.
Specifically, the field content types of the plurality of first field contents include text field contents and digital field contents.
Step S500, when the service scene is a first service scene and the field content type of the first field content is a digital content type, extracting a first digital field content from the first field content based on the first service scene.
Step S600, a first image associated with the target image is acquired.
Step S700, obtaining a second digital field content associated with the first digital field content from the first image.
Step S800, correcting the first digital field content in an auxiliary manner according to the association relationship between the first digital field content and the second digital field content, so as to obtain a target digital field content after auxiliary correction.
And step S900, feeding back the content of the target digital field to the client.
Different field content types and service scenes correspond to different auxiliary correction strategies. For example:
(1) the service scenario may be a first service scenario, and the field content type of the first field content may be a first digital field content.
For example, the target image is a reverse image of the identification card, the first image associated with the target image is a front image of the identification card, the content of the first digital field corresponds to the content of the digital field of the standard field "validity period", and the content of the second digital field corresponds to the content of the digital field corresponding to the "national identification number". The association relationship between the first digital field content and the second digital field content is the association relationship between the citizen identity number and the valid period in the identity card rule. And when the first digital field content is corrected in an auxiliary manner, the first 8-bit digital field content and the last 8-bit digital field content of the digital field content corresponding to the 'validity period' are respectively taken, and the last 4 bits and the last 8 bits of the first 8-bit digital field content are compared with the last 4 bits of the digital field content. And correcting the first 4 bits and the last 8 bits of the first 8-bit digital field content to be the first 4 bits of the digital field content based on the second digital field content corresponding to the year and the current year. In the analysis process of the big data error recognition case, the field content recognition error corresponding to the 'validity period' standard field accounts for 75% of the multiple field content recognition error types of the same type of image, so that the field content corresponding to the 'validity period' standard field is subjected to auxiliary correction, and 75% of recognition failure can be effectively recovered; the identification rate of identifying numbers from the identity card image is effectively improved.
In an exemplary embodiment, referring to fig. 4, after the step S500, a step S5011 to a step S5013 may be further included, where: step S5011, determining whether a first designated location in the first digital field content includes a character; step S5012, when the first designated position comprises a character and the character is inconsistent with a preset character, replacing the character with the preset character; the preset character is a preset specific character of the first designated position; and a step S5013 of filling the preset character in the first designated position when the first designated position does not include any character.
The above-mentioned auxiliary correction operation is exemplarily described below with the target image as a reverse image of the identification card. Traversing the content of the first digital field, and locating a first specified position in the content of the first digital field, for example, a middle position in a content area of the field to be identified corresponding to the standard field with the first specified position being "validity period"; and judging whether the first designated position contains characters, if the first designated position contains the characters, further comparing the characters at the first designated position with preset specific characters at the first designated position, if the comparison result is consistent, indicating that the verification is passed, and if the comparison result is inconsistent, replacing the characters with the preset specific characters. It is understood that if the specific character is preset as "-", when the first designated location is recognized to contain a character, the recognized character can be "-", "\" or "! When the characters are equal, comparing the characters at the first designated position with preset specific characters, and if the comparison is consistent, indicating that the verification is passed; and if the comparison result is inconsistent, replacing the character identified by the first designated position with the preset specific character "-" so as to perform auxiliary correction on the character at the first designated position.
(2) The service scenario may be a second service scenario, and the field content type of the first field content may be a first text field content.
Referring to fig. 5, the step S500 may include steps S511 to S515, in which: step S511, based on the second service scenario, obtaining first text field content from the plurality of first field contents; step S512, acquiring a second image associated with the target image; step S513, obtaining a second text field content associated with the first text field content from the second image; step S514, extracting key word content from the second word field content according to the incidence relation between the first word field content and the second word field content; and step S515, the first text field content is corrected in an auxiliary mode through the key text content, and the target text field content after auxiliary correction is obtained.
For example, the target image is a reverse image of the identification card, the second image associated with the target image is a front image of the identification card, the first text field content is text field content corresponding to a standard field of "issuing authority", and the second text field content is text field content corresponding to a standard field of "address". The association relationship between the first text field content and the second text field content is the association relationship between the issuing organization and the address in the identity card rule. According to the positive image of the ID card, key word contents including but not limited to province, autonomous region (inner Mongolia, Guangxi, Ningxia, Xinjiang, Tibet), city, autonomous state, county, district, etc. can be known.
Illustratively, the step S515 may include the following cases:
when the key character field content comprises 'county', the standard character combination of the issuing authority is acquired as 'county name + public security bureau', so as to assist in correcting the character field content corresponding to the issuing authority.
When the key character field content does not include the county and the district and includes the city, the standard character combination of the issuing authority is acquired as the city name and the public security bureau to assist in correcting the character field content corresponding to the issuing authority.
When the key character field content comprises a city and a district/new district, the standard character combination of the issuing authority is acquired as the city, the public security bureau, the district (the district characters are removed), and the branch bureau, so as to assist in correcting the character field content corresponding to the issuing authority. For example, if the "region" in the address is the Pudong partition, the text field content of the issuing authority is "Pudong partition of public Security office in Shanghai city".
When the key character field content comprises a city and a development area, the standard character combination of the issuing authority is acquired as the city, the public security bureau, the area (removing the regional characters) and the branch bureau so as to assist in correcting the character field content corresponding to the issuing authority.
Specifically, key character field content is obtained, the first several digits of 'public security' of the character field content are extracted from the character field content of the issuing authority as character field content to be verified, the key character field content is compared with the character field content to be verified, and if the comparison result is consistent, the verification is passed; and if the comparison result is inconsistent, replacing the first bits of the 'district', 'county' or 'city' in the key character field content with the character field content to be verified to generate the target character field content.
In an exemplary embodiment, as shown in fig. 6, after the step of obtaining the first text field content from the plurality of first field contents based on the second service scenario, the method further includes steps S5101 to S5103, where: step S5101, determining whether a second designated position in the first text field content includes text content; step S5102, when the second designated position comprises the text content and the text content is inconsistent with the preset text content, replacing the text content with the preset text content; the preset text content is specific text content associated with the second appointed position; in step S5103, when the second designated location does not include the text content, the preset text content is filled in the second designated location.
Continuing to exemplarily illustrate the auxiliary correction operation by taking the target image as the reverse image of the identity card. Traversing the content of the first character field, and positioning a second appointed position in the content of the first character field; judging whether the second appointed position contains character content, if the second appointed position contains characters, further comparing the character content of the second appointed position with the specific character content associated with the second appointed position, and generating a comparison result; if the comparison result is consistent, the verification is passed; and if the comparison result is inconsistent, replacing the text content with the specific text content. For example, the second designated position in the content of the first text field is a plurality of tail positions in the content area of the field to be identified corresponding to the standard field of the 'issuing authority'. The specific text content associated with the second designated position is "police office". Illustratively, when the comparison result is inconsistent, the "office" is complemented at the next digit of the "public security". In the analysis process of the big data error identification case, the field content identification error corresponding to the standard field of the issuing authority accounts for 95.31% of the identification error types of various field contents of the same type of images, so that the field content corresponding to the standard field of the issuing authority is subjected to auxiliary correction, and 95.31% of identification failures can be effectively recovered; the recognition rate of recognizing characters from the identity card image is effectively improved.
The method comprises the following steps: and storing the acquired identification request and the corresponding target field content in the block chain.
The block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Under the condition that the OCR recognition technology is relatively closed, the optimal selection of the OCR recognition model is carried out in the OCR recognition models provided by a plurality of OCR manufacturers, so that the target recognition model is obtained; the recognized result is subjected to auxiliary correction through the selected auxiliary correction strategy, so that characters can be intelligently and efficiently recognized and corrected, the recognition rate of field contents in the OCR image is effectively improved, and the recognition and correction accuracy is effectively improved; in an actual test, the OCR recognition rate is improved from 80% to 97%, and the user experience is improved.
Example two
Continuing to refer to FIG. 7, a block diagram of a program module of the field content identification system of the present invention is shown. In this embodiment, the field content identification system 20 may include or be divided into one or more program modules, which are stored in a storage medium and executed by one or more processors to implement the present invention and implement the field content identification method described above. The program module referred to in the embodiments of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than the program itself for describing the execution process of the field content recognition system 20 in the storage medium. The following description will specifically describe the functions of the program modules of the present embodiment:
the receiving module 600 is configured to receive an identification request sent by a client, where the identification request carries a target image and a service scene corresponding to the target image.
The invoking module 610 is configured to invoke a target recognition model of the multiple candidate recognition models based on the service scenario.
An identifying module 620, configured to identify the target image based on the target identification model to obtain a plurality of first field contents.
An auxiliary correction module 630, configured to assist in correcting the plurality of first field contents according to the field content types of the plurality of first field contents to obtain target field contents; wherein,
an extracting module 640, configured to, when a field content type of the first field content is a digital content type, extract a first digital field content from the first field content based on the service scenario;
a first obtaining module 650 for obtaining a first image associated with the target image;
a second obtaining module 660, configured to obtain, from the first image, a second digital field content associated with the first digital field content;
a generating module 670, configured to assist in correcting the first digital field content according to the association relationship between the first digital field content and the second digital field content, and generate a target digital field content after assisted correction.
A feedback module 680, configured to feed back the content of the target digital field to the client.
In an exemplary embodiment, the invoking module 610 is further configured to: determining a plurality of field content areas to be identified based on the service scene; calculating the comprehensive weight of each candidate recognition model according to the preset weight of each field content area to be recognized and each candidate recognition model; determining a candidate identification model with the highest comprehensive weight as the target identification model, and calling the target identification model; when a plurality of target recognition models are determined according to the service scene, acquiring the calling times of the plurality of target recognition models in preset time; and comparing the calling times of the target identification models, and determining the target identification model with the minimum calling time as the target identification model matched with the service scene.
In an exemplary embodiment, the identifying module 620 is further configured to: dividing the target image to obtain a plurality of image blocks; determining a plurality of field content areas to be identified on a plurality of image blocks based on the format of the target image; extracting a plurality of convolution characteristics of a plurality of field content areas to be identified from the target identification model; inputting the plurality of convolution characteristics into a classifier of the target recognition model for recognition; and generating a plurality of first field contents corresponding to the plurality of field content areas to be identified.
In an exemplary embodiment, the auxiliary correction module 630 is further configured to: judging whether a first designated position in the content of the first digital field comprises characters or not; when the first designated position comprises a character and the character is inconsistent with a preset character, replacing the character with the preset character; the preset character is a preset specific character of the first designated position; and; and when the first designated position does not comprise any character, filling the preset character into the first designated position.
In an exemplary embodiment, the service scenario is a second service scenario. The auxiliary correction module 630 is further configured to: acquiring first text field contents from the plurality of first field contents based on the second service scene; acquiring a second image associated with the target image; acquiring second text field content associated with the first text field content from the second image; extracting key word content from the second word field content according to the incidence relation between the first word field content and the second word field content; and the first text field content is corrected in an auxiliary manner through the key text content, so that the target text field content after auxiliary correction is obtained.
In an exemplary embodiment, the auxiliary correction module 630 is further configured to: judging whether a second appointed position in the first text field content comprises text content or not; when the second designated position comprises the text content and the text content is inconsistent with the preset text content, replacing the text content with the preset text content; the preset text content is specific text content associated with the second appointed position; and when the preset specific position does not contain the text content, filling the preset specific text content of the specific position into the first text field content.
EXAMPLE III
Fig. 8 is a schematic diagram of a hardware architecture of a computer device according to a third embodiment of the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown in FIG. 8, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a field content identification system 20, which may be communicatively coupled to each other via a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both internal and external memory units of the computer device 2. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various types of application software, such as the program codes of the field content identification system 20 of the above-mentioned embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In the present embodiment, the processor 22 is configured to run the program code stored in the memory 21 or process data, for example, run the field content identification system 20, so as to implement the field content identification method of the above-described embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing communication connection between the computer device 2 and other electronic apparatuses. For example, the network interface 23 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 8 only shows the computer device 2 with components 20-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the field content identification system 20 stored in the memory 21 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention.
For example, fig. 7 shows a schematic diagram of program modules of an embodiment implementing the field content identification system 20, in which the field content identification system 20 may be divided into a receiving module 600, a calling module 610, an identification module 620, a selection module 630, an auxiliary correction module 640, and a feedback module 650. The program module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the field content identification system 20 in the computer device 2. The specific functions of the program modules 600-650 have been described in detail in embodiment two, and are not described herein again.
Example four
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of the present embodiment is used for storing the field content identification system 20, and when being executed by a processor, the field content identification method of the above embodiment is implemented.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A field content identification method is characterized by comprising the following steps:
receiving an identification request sent by a client, wherein the identification request carries a target image and a service scene corresponding to the target image;
calling a target recognition model in a plurality of candidate recognition models based on the service scene;
identifying the target image based on the target identification model to obtain a plurality of first field contents;
according to the field content types of the first field contents, assisting in correcting the first field contents to obtain target field contents; wherein,
when the service scene is a first service scene and the field content type of the first field content is a digital content type, extracting first digital field content from the first field content based on the first service scene;
acquiring a first image associated with the target image;
obtaining second digital field content associated with the first digital field content from the first image;
the first digital field content is corrected in an auxiliary mode through the incidence relation between the first digital field content and the second digital field content, and target digital field content after auxiliary correction is generated; and
and feeding back the content of the target digital field to the client.
2. The field content identification method according to claim 1, wherein the step of invoking a target identification model of a plurality of candidate identification models based on the service scenario further comprises:
determining a plurality of field content areas to be identified based on the service scene;
calculating the comprehensive weight of each candidate recognition model according to the preset weight of each field content area to be recognized and each candidate recognition model; and
determining a candidate identification model with the highest comprehensive weight as the target identification model, and calling the target identification model; wherein,
when a plurality of target recognition models are determined according to the service scene, acquiring the calling times of the plurality of target recognition models within preset time;
and comparing the calling times of the target identification models, and determining the target identification model with the minimum calling time as the target identification model matched with the service scene.
3. The field content identification method according to claim 2, wherein the step of identifying the target image based on the target identification model to obtain a plurality of first field contents further comprises:
dividing the target image to obtain a plurality of image blocks;
determining a plurality of field content areas to be identified on a plurality of image blocks based on the format of the target image;
extracting a plurality of convolution characteristics of a plurality of field content areas to be identified from the target identification model;
inputting the plurality of convolution characteristics into a classifier of the target recognition model for recognition;
and generating a plurality of first field contents corresponding to the plurality of field content areas to be identified.
4. The method for identifying field contents according to claim 3, wherein the step of obtaining a first digital field content from the plurality of first field contents based on the first service scenario further comprises:
judging whether a first designated position in the content of the first digital field comprises characters or not;
when the first designated position comprises a character and the character is inconsistent with a preset character, replacing the character with the preset character; the preset character is a preset specific character of the first designated position; and
and when the first designated position does not comprise any character, filling the preset character into the first designated position.
5. The field content identification method according to claim 3, wherein the service scenario is a second service scenario;
the step of assisting in correcting the plurality of first field contents to obtain the target field contents according to the field content types of the plurality of first field contents comprises:
acquiring first text field contents from the plurality of first field contents based on the second service scene;
acquiring a second image associated with the target image;
acquiring second text field content associated with the first text field content from the second image;
extracting key word content from the second word field content according to the incidence relation between the first word field content and the second word field content;
and the first text field content is corrected in an auxiliary manner through the key text content, so that the target text field content after auxiliary correction is obtained.
6. The method for identifying the field contents according to claim 5, wherein after the step of obtaining the first text field contents from the plurality of first field contents based on the second service scenario, the method further comprises:
judging whether a second appointed position in the first text field content comprises text content or not;
when the second designated position comprises the text content and the text content is inconsistent with the preset text content, replacing the text content with the preset text content; the preset text content is specific text content associated with the second designated position;
and when the second appointed position does not comprise the text content, filling the preset text content to the second appointed position.
7. The field content identification method according to claim 1, wherein the method comprises: and storing the acquired identification request and the corresponding target field content in the block chain.
8. A field content recognition system, comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an identification request sent by a client, and the identification request carries a target image and a service scene corresponding to the target image;
the calling module is used for calling a target recognition model in a plurality of candidate recognition models based on the service scene;
the identification module is used for identifying the target image based on the target identification model so as to obtain a plurality of first field contents;
the auxiliary correction module is used for performing auxiliary correction on the plurality of first field contents according to the field content types of the plurality of first field contents to obtain target field contents; wherein,
an extracting module, configured to, when a field content type of the first field content is a digital content type, extract a first digital field content from the first field content based on the service scenario;
a first acquisition module for acquiring a first image associated with the target image;
a second obtaining module, configured to obtain, from the first image, a second digital field content associated with the first digital field content;
a generating module, configured to assist in correcting the first digital field content and generate an assisted-corrected target digital field content according to an association relationship between the first digital field content and the second digital field content; and the feedback module is used for feeding back the content of the target digital field to the client.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the field content identification method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which is executable by at least one processor to cause the at least one processor to perform the steps of the field content identification method according to any one of claims 1 to 7.
CN202011555047.6A 2020-12-23 2020-12-23 Method, system, device and computer readable storage medium for identifying field content Active CN112699871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011555047.6A CN112699871B (en) 2020-12-23 2020-12-23 Method, system, device and computer readable storage medium for identifying field content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011555047.6A CN112699871B (en) 2020-12-23 2020-12-23 Method, system, device and computer readable storage medium for identifying field content

Publications (2)

Publication Number Publication Date
CN112699871A true CN112699871A (en) 2021-04-23
CN112699871B CN112699871B (en) 2023-11-14

Family

ID=75510051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011555047.6A Active CN112699871B (en) 2020-12-23 2020-12-23 Method, system, device and computer readable storage medium for identifying field content

Country Status (1)

Country Link
CN (1) CN112699871B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435993A (en) * 2021-08-27 2021-09-24 聆笙(北京)科技有限公司 Receipt data recognition system and method thereof
CN114339375A (en) * 2021-08-17 2022-04-12 腾讯科技(深圳)有限公司 Video playing method, method for generating video directory and related product
CN116702024A (en) * 2023-05-16 2023-09-05 见知数据科技(上海)有限公司 Method, device, computer equipment and storage medium for identifying type of stream data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH076200A (en) * 1992-04-20 1995-01-10 Nec Corp Optical character reader
US20180204360A1 (en) * 2017-01-13 2018-07-19 International Business Machines Corporation Automatic data extraction from a digital image
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
WO2019071660A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Bill information identification method, electronic device, and readable storage medium
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device
CN111461108A (en) * 2020-02-21 2020-07-28 浙江工业大学 Medical document identification method
CN111832382A (en) * 2019-04-15 2020-10-27 通用电气公司 Optical character recognition error correction based on visual and textual content

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH076200A (en) * 1992-04-20 1995-01-10 Nec Corp Optical character reader
US20180204360A1 (en) * 2017-01-13 2018-07-19 International Business Machines Corporation Automatic data extraction from a digital image
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
WO2019071660A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Bill information identification method, electronic device, and readable storage medium
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device
CN111832382A (en) * 2019-04-15 2020-10-27 通用电气公司 Optical character recognition error correction based on visual and textual content
CN111461108A (en) * 2020-02-21 2020-07-28 浙江工业大学 Medical document identification method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339375A (en) * 2021-08-17 2022-04-12 腾讯科技(深圳)有限公司 Video playing method, method for generating video directory and related product
CN114339375B (en) * 2021-08-17 2024-04-02 腾讯科技(深圳)有限公司 Video playing method, method for generating video catalogue and related products
CN113435993A (en) * 2021-08-27 2021-09-24 聆笙(北京)科技有限公司 Receipt data recognition system and method thereof
CN116702024A (en) * 2023-05-16 2023-09-05 见知数据科技(上海)有限公司 Method, device, computer equipment and storage medium for identifying type of stream data
CN116702024B (en) * 2023-05-16 2024-05-28 见知数据科技(上海)有限公司 Method, device, computer equipment and storage medium for identifying type of stream data

Also Published As

Publication number Publication date
CN112699871B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN112699871B (en) Method, system, device and computer readable storage medium for identifying field content
CN112785411B (en) Credit data processing method, system, equipment and computer readable storage medium
CN112036145A (en) Financial statement identification method and device, computer equipment and readable storage medium
CN110363222B (en) Picture labeling method and device for model training, computer equipment and storage medium
CN112668575B (en) Key information extraction method and device, electronic equipment and storage medium
CN112115107A (en) Contract text automatic generation method and device
CN110414197B (en) Bank user identity verification method and device based on behavior characteristics
CN111222517A (en) Test sample generation method, system, computer device and storage medium
CN114817340A (en) Data tracing method and device, computer equipment and storage medium
CN113190381B (en) Data backup method, system, equipment and storage medium
CN111242779A (en) Financial data characteristic selection and prediction method, device, equipment and storage medium
CN110992155A (en) Bidding and enclosing processing method and related product
CN109446217A (en) Data method, electronic device and computer readable storage medium
CN113343577B (en) Parameter optimization method, device, equipment and medium based on machine learning
CN109472687A (en) Air control amount calculation method, device, computer equipment and storage medium
CN114511314A (en) Payment account management method and device, computer equipment and storage medium
CN109409922A (en) Data aggregate modeling method, device, computer equipment and storage medium
CN112132693A (en) Transaction verification method, transaction verification device, computer equipment and computer-readable storage medium
CN113936130A (en) Document information intelligent acquisition and error correction method, system and equipment based on OCR technology
CN113837170A (en) Automatic auditing processing method, device and equipment for vehicle insurance claim settlement application
CN112651824A (en) Non-silver account opening processing method and device, computer equipment and storage medium
CN111541828A (en) Signature method, signature device, computer equipment and computer readable storage medium
CN111461099A (en) Bill identification method, system, equipment and readable storage medium
CN111242115A (en) Bill collection method, system, computer equipment and storage medium
CN115983973A (en) Service processing method and device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant