CN110378347B

CN110378347B - Method and device for extracting key information of medical examination sheet

Info

Publication number: CN110378347B
Application number: CN201910598986.XA
Authority: CN
Inventors: 吴志超; 柯登峰; 刘宁; 王静; 胡茜
Original assignee: Beijing Aidoctor Intelligent Medical Technology Co ltd
Current assignee: Beijing Aidoctor Intelligent Medical Technology Co ltd
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2021-10-08
Anticipated expiration: 2039-07-04
Also published as: CN110378347A

Abstract

The embodiment of the invention provides a method and a device for extracting key information of a medical examination form, which are used for sequentially identifying each character in a target examination form and a left boundary coordinate and a right boundary coordinate corresponding to each character, so that all characters in the target examination form are divided into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to each character, each text line is divided into a plurality of character blocks, and finally, the key information in the target examination form is extracted from all the character blocks in a keyword matching mode. The method and the device can accurately extract the key information in the medical examination order, overcome the problem that the data in the medical examination order is difficult to accurately extract in the prior art, and are beneficial to storing and warehousing the key information in the medical examination order of the patient so as to effectively share and circulate the health information of the patient.

Description

Method and device for extracting key information of medical examination sheet

Technical Field

The invention relates to the technical field of character recognition, in particular to a method and a device for extracting key information of a medical examination order.

Background

The medical examination sheet is an important basis for doctors to diagnose the illness state of patients and observe the health condition of the patients. However, in the existing medical system, the health information of the patients does not establish an effective shared circulation mechanism, and the patients are required to perform repeated laboratory tests every time the patients are transferred, and even in some conventional tests, the patients are continuously subjected to queuing, registration and waiting pain. How to save the time and expense of patient's seeing a doctor, how to establish the health information that can effectively circulate for the patient, effectual solution is to carry out effective extraction and storage warehouse entry with the data in patient's medical examination paper.

Data in medical checklists are usually presented in the form of tables, and the traditional practice of extracting tables is to use visual patterns to divide table elements. However, the form in the medical examination form usually has no clear separation line, rectangle or interval, and the accuracy of the method of dividing the form elements by using a visual mode alone is low, so that the method is not suitable for extracting data in the medical examination form.

Disclosure of Invention

The embodiment of the invention provides a method and a device for extracting key information of a medical examination order, aiming at overcoming the problem that data in the medical examination order is difficult to accurately extract in the prior art.

In a first aspect, an embodiment of the present invention provides a method for extracting key information of a medical examination order, including:

sequentially identifying each character in the target inspection list and the left boundary coordinate and the right boundary coordinate corresponding to each character, and dividing all the characters into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to all the characters by using a preset line division rule, wherein each text line comprises a plurality of characters;

for any text line, segmenting all characters in the text line into a plurality of character blocks by utilizing a preset segmentation rule according to left boundary coordinates and right boundary coordinates corresponding to all characters in the text line respectively, wherein each character block comprises at least one character;

screening character blocks containing first key information from all the character blocks in a keyword matching mode to serve as first character blocks, and extracting the first key information from the first character blocks, wherein the first key information is a hospital name;

screening character blocks containing second key information from all the character blocks in a keyword matching mode to serve as second character blocks, and extracting the second key information from all the second character blocks, wherein the second key information comprises patient names, patient sexes, patient ages, examining doctors and examining time;

and taking each character block except the first character block and the second character block as a third character block, and extracting third key information from all the third character blocks, wherein the third key information comprises an inspection item, a result, a unit and a reference range.

In a second aspect, an embodiment of the present invention provides an apparatus for extracting key information of a medical checklist, including:

the character line dividing module is used for sequentially identifying each character in the target inspection list and the left boundary coordinate and the right boundary coordinate corresponding to each character, and dividing all the characters into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to all the characters by using a preset line dividing rule, wherein each text line comprises a plurality of characters;

the character blocking module is used for dividing all characters in any text line into a plurality of character blocks by using a preset blocking rule according to the left boundary coordinate and the right boundary coordinate which correspond to all characters in the text line respectively, wherein each character block comprises at least one character;

the first key information extraction module is used for screening out character blocks containing first key information from all the character blocks in a keyword matching mode to serve as first character blocks, and extracting the first key information from the first character blocks, wherein the first key information is a hospital name;

the second key information extraction module is used for screening out character blocks containing second key information from all the character blocks in a keyword matching mode to serve as second character blocks, and extracting the second key information from all the second character blocks, wherein the second key information comprises the name of a patient, the sex of the patient, the age of the patient, an examining doctor and examining time;

and the third key information extraction module is used for taking each character block except the first character block and the second character block as a third character block and extracting third key information from all the third character blocks, wherein the third key information comprises an inspection item, a result, a unit and a reference range.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.

According to the method and the device for extracting the key information of the medical examination form, provided by the embodiment of the invention, each character in the target examination form and the left boundary coordinate and the right boundary coordinate corresponding to each character are sequentially identified, so that all characters in the target examination form are divided into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to each character, each text line is divided into a plurality of character blocks, and finally the key information in the target examination form is extracted from all the character blocks in a keyword matching mode. The method and the device can accurately extract the key information in the medical examination order, overcome the problem that the data in the medical examination order is difficult to accurately extract in the prior art, and are beneficial to storing and warehousing the key information in the medical examination order of the patient so as to effectively share and circulate the health information of the patient.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for extracting key information of a medical examination order according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a key information extraction device of a medical checklist according to an embodiment of the present invention;

fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a method for extracting key information of a medical examination form according to an embodiment of the present invention, and as shown in fig. 1, the embodiment of the present invention provides a method for extracting key information of a medical examination form, including:

s1, sequentially identifying each character in the target inspection list and the left boundary coordinate and the right boundary coordinate corresponding to each character, and dividing all characters into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to all characters by using a preset line division rule, wherein each text line comprises a plurality of characters;

specifically, the characters in the target inspection list are sequentially recognized by utilizing a character recognition technology according to the sequence of texts in the target inspection list from top to bottom and from left to right, and left boundary coordinates and right boundary coordinates corresponding to each character are obtained. The character recognition technology may be an Optical Character Recognition (OCR) technology, or other character recognition technologies, and may be set according to actual requirements, which is not specifically limited herein.

Further, all characters are segmented into a plurality of text lines according to left boundary coordinates and right boundary coordinates corresponding to all the characters respectively by utilizing a preset line segmentation rule, wherein each text line comprises a plurality of characters. It can be understood that, in the same text line, the left boundary coordinate and the right boundary coordinate of each character are sequentially increased, and thus, if the left boundary coordinate and the right boundary coordinate of the two sequentially recognized adjacent characters are decreased, it can be determined that the two adjacent characters are in different text lines, that is, a line division process needs to be performed between the two adjacent characters. In the embodiment of the invention, the preset line division rule is preset based on the principle, so that all characters in the target inspection list are divided into a plurality of text lines by using the preset line division rule.

S2, for any text line, dividing all characters in the text line into a plurality of character blocks by using a preset block dividing rule according to the left boundary coordinate and the right boundary coordinate corresponding to all characters in the text line, wherein each character block comprises at least one character;

specifically, after all characters in the target check list are segmented into a plurality of text lines, for any one text line, all characters in the text line are segmented into a plurality of character blocks according to left boundary coordinates and right boundary coordinates corresponding to all characters in the text line respectively by using a preset segmentation rule. Thus, all characters in each line of text may be segmented into a plurality of character blocks. Wherein each character block contains at least one character. It is understood that, in the same character block, the distance between two adjacent characters is smaller, and therefore, in the same text line, if the distance between two adjacent characters is larger, it can be determined that the two adjacent characters belong to different character blocks. In the embodiment of the present invention, the distance between two adjacent characters can be represented as a coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character. Therefore, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character is large, it can be determined that the current character and the next character belong to different character blocks. That is, the blocking process needs to be performed between the current character and its next character. In the embodiment of the present invention, a preset blocking rule is preset based on the above principle, so that all characters in each text line are divided into a plurality of character blocks by using the preset blocking rule.

S3, screening character blocks containing first key information from all the character blocks in a keyword matching mode to serve as first character blocks, and extracting the first key information from the first character blocks, wherein the first key information is a hospital name;

specifically, through the above method steps, all characters in the target checklist are segmented into a plurality of text lines, and each text line is segmented into a plurality of text blocks. On the basis, character blocks containing first key information are screened out from all the character blocks in a keyword matching mode to serve as first character blocks. Wherein, the first key information is the name of the hospital. That is, all the character blocks in the target check list are traversed, and if two characters of "hospital" are included in a certain character block, it is indicated that the hospital name is included in the character block, and the character block is taken as the first character block. After the first character block is screened out, first key information is extracted from the first character block, namely a hospital name is extracted from the first character block. In the embodiment of the invention, the characters in the first character block can be matched with the hospital names in the national hospital dictionary, and if some characters in the first character block are successfully matched with some hospital names in the national hospital dictionary, the characters can be determined to be the hospital names in the target examination list.

S4, screening out character blocks containing second key information from all the character blocks in a keyword matching mode to serve as second character blocks, and extracting the second key information from all the second character blocks, wherein the second key information comprises patient names, patient sexes, patient ages, examining doctors and examining time;

specifically, a keyword matching mode is used for screening out character blocks containing second key information from all the character blocks to serve as second character blocks. The second key information includes the name of the patient, the sex of the patient, the age of the patient, the doctor and the time of examination, etc. That is, when all the character blocks in the target checklist are traversed and a certain character block includes characters such as "name" or "gender" or "age" or "doctor for examination" or "examination time", the character block is set as the second character block. After the second character block is screened out, second key information is extracted from the second character block, namely information such as the name of the patient, the sex of the patient, the age of the patient, the doctor for examination, the time of examination and the like is extracted from the second character block. In the embodiment of the present invention, if a certain second character block includes two characters, namely, a "name", a character appearing after the two characters, namely, the "name", in the second character block may be used as the name of the patient; if a second character block contains two characters of "gender", the character of the second character block appearing after the two characters of "gender" can be taken as the gender of the patient; if two characters of "age" are contained in a certain second character block, the character appearing after the two characters of "age" in the second character block can be taken as the age of the patient; if a certain second character block contains four characters of 'examining doctor', the characters appearing after the four characters of 'examining doctor' in the second character block can be used as the names of the examining doctors; if a second character block includes four characters of "check time", the character of the second character block that appears after the four characters of "check time" can be used as the specific check time.

S5, regarding each of the other character blocks except the first character block and the second character block as a third character block, and extracting third key information from all the third character blocks, the third key information including the examination item, the result, the unit, and the reference range.

Specifically, on the basis of the above technical solution, each of the other character blocks in the target checklist except for the first character block and the second character block is taken as a third character block. On the basis, third key information is extracted from all third character blocks, wherein the third key information comprises an inspection item, a result, a unit, a reference range and the like. It should be noted that, in the medical examination list, the examination items are generally represented by chinese characters; results are generally expressed as a single numerical value; units are typically represented by english characters; reference ranges are generally expressed as numerical ranges. In view of this, in the embodiment of the present invention, if a certain text line includes a plurality of third character blocks, all characters in the third character blocks including chinese characters in the text line are used as a check item; taking all characters in a third character block containing a single numerical value in the text line as a result corresponding to the check item; all characters in a third character block containing English characters in the text line are taken as units corresponding to the result; and taking all characters in a third character block containing the numerical range in the text line as a reference range corresponding to the result. Therefore, all the inspection items in the target inspection list and the result, unit and reference range corresponding to each inspection item can be extracted.

According to the method for extracting the key information of the medical examination form, provided by the embodiment of the invention, each character in the target examination form and the left boundary coordinate and the right boundary coordinate corresponding to each character are sequentially identified, so that all characters in the target examination form are divided into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to each character, each text line is divided into a plurality of character blocks, and finally the key information in the target examination form is extracted from all the character blocks in a keyword matching mode. The method can accurately extract the key information in the medical examination order, overcomes the problem that the data in the medical examination order is difficult to accurately extract in the prior art, and is beneficial to storing and warehousing the key information in the medical examination order of the patient so as to effectively share and circulate the health information of the patient.

Based on any of the above embodiments, there is provided a method for extracting key information of a medical checklist, where each of the other character blocks except the first character block and the second character block is used as a third character block, and then the method further includes: determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a distance reference mode, and marking the positions as first marking positions; determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a global reference mode, and marking the positions as second marking positions; determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a local reference mode, and marking the positions as third marking positions; performing secondary blocking processing on all third character blocks according to all first mark positions, all second mark positions and all third mark positions; correspondingly, the third key information is extracted from all the third character blocks, specifically: and extracting third key information from all third character blocks subjected to secondary blocking processing.

Specifically, in the embodiment of the present invention, each of the other character blocks except for the first character block and the second character block is taken as a third character block, and it can be understood that the third character block includes related information of the check item in the target check list. Generally, the related information of the examination items in the medical examination sheet generally includes the names, results, units and reference ranges of the respective examination items, wherein the names, results, units and reference ranges of the same examination items are displayed in parallel in the same text line. However, since the names of some check items are too long, the intervals between the names and the results of the check items are small, so that the names and the results of the check items are easily divided into the same character block by mistake, and therefore, there may be a case where a blocking error occurs in the third character block. In view of this, in the embodiment of the present invention, after the third character block is obtained, the positions of all the third character blocks that need to be subjected to the secondary block partitioning processing are determined by using the distance reference manner, the global reference manner, and the local reference manner, respectively, so as to perform the secondary block partitioning processing on all the third character blocks, which is specifically implemented as follows:

determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a distance reference mode, and marking the positions as first marking positions; and the distance reference mode further determines whether each third character block needs to be subjected to secondary block processing by taking the average value of the distances between the character blocks in the target inspection list as reference, and marks the position needing to be subjected to the secondary block processing as a first mark position. Determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a global reference mode, and marking the positions as second marking positions; the global reference mode determines the text line which needs to be subjected to secondary block processing by taking the number of character blocks included in each text line including the third character block as a reference, finally determines the third character block which needs to be subjected to secondary block processing in each text line which needs to be subjected to secondary block processing by taking all other text lines as references, and marks the position, which needs to be subjected to secondary block processing, in the third character block as a second mark position. Determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a local reference mode, and marking the positions as third marking positions; the local reference mode determines a text line which needs to be subjected to secondary blocking processing by taking the number of character blocks included in each text line including a third character block as a reference, finally determines the third character block which needs to be subjected to secondary blocking processing in each text line which needs to be subjected to secondary blocking processing by taking an adjacent text line as a reference, and marks a position which needs to be subjected to secondary blocking processing in the third character block as a third mark position.

Further, all the third character blocks are subjected to secondary block processing according to all the first mark positions, all the second mark positions and all the third mark positions, that is, the third character blocks are further subjected to block processing at all the first mark positions, all the second mark positions and all the third mark positions. Therefore, the situation of block errors occurring in the third character block can be effectively corrected, and the block accuracy of each third character block is further ensured. And finally, extracting third key information from all third character blocks subjected to secondary blocking processing to ensure that the name, result, unit and reference range of each inspection item in the target inspection list can be accurately extracted.

According to the method for extracting the key information of the medical examination list, provided by the embodiment of the invention, the positions of all third character blocks needing secondary block processing are determined by respectively utilizing the distance reference mode, the global reference mode and the local reference mode, so that the secondary block processing is carried out on all the third character blocks, the condition of block errors occurring in the third character blocks can be effectively corrected, the block accuracy of each third character block is further ensured, and the name, the result, the unit and the reference range of each examination item in the target examination list can be accurately extracted from all the third character blocks after the secondary block processing.

Based on any one of the above embodiments, a method for extracting key information of a medical examination order is provided, where positions of all third character blocks that need to be subjected to secondary block processing are determined in a distance reference manner, and specifically: calculating the mean value of the distances between every two adjacent character blocks in all the character blocks contained in the target inspection list, and determining a first threshold value according to the mean value; and for any one third character block, sequentially traversing each character in the third character block, and for any current character, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is greater than a first threshold value, determining the position between the current character and the next character of the current character as the position needing secondary block processing.

Specifically, in the embodiment of the present invention, the positions of all third character blocks that need to be subjected to the secondary block division processing are determined in a distance reference manner, which is specifically implemented as follows:

the method comprises the steps of firstly obtaining all character blocks contained in a target inspection list, calculating the distance between every two adjacent character blocks in all the character blocks, then obtaining the mean value of the distance between every two adjacent character blocks, and further determining a first threshold value according to the mean value. The first threshold is generally smaller than the mean value, and may be set according to actual requirements, which is not specifically limited herein. After the first threshold is determined, for any third character block, sequentially traversing each character in the third character block, and for any current character, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is greater than the first threshold, indicating that the distance between the current character and the next character is larger, in this case, the current character and the next character should belong to different character blocks, and therefore, the position between the current character and the next character of the current character is determined as the position where secondary blocking processing is required.

According to the method for extracting the key information of the medical examination order, provided by the embodiment of the invention, the positions of all third character blocks needing secondary block processing are determined by using a distance reference mode so as to perform secondary block processing on all third character blocks, so that the condition of block errors in the third character blocks can be effectively corrected, and the block accuracy of each third character block is further ensured.

Based on any one of the above embodiments, a method for extracting key information of a medical examination order is provided, where positions of all third character blocks that need to be subjected to secondary blocking processing are determined in a global reference manner, and specifically: taking each text line containing the third character block as a target text line, counting the number of the character blocks contained in each target text line, determining the text line with correct blocks and the text line with wrong blocks in all the target text lines according to the counting result, and respectively taking the text lines as reference text lines and text lines to be corrected; and for any text line to be corrected, determining the position of the text line to be corrected, which needs to be subjected to secondary block processing, according to the coordinate overlapping rate between the text line to be corrected and the reference text line.

Specifically, in the embodiment of the present invention, the positions of all third character blocks that need to be subjected to the secondary block partitioning processing are determined in a global reference manner, which is specifically implemented as follows:

each text line containing the third character blocks is taken as a target text line, that is, if one or more third character blocks are contained in a certain text line, the text line is taken as a target text line. On the basis, the number of character blocks contained in each target text line is counted, and text lines with correct blocks and text lines with errors in the blocks in all the target text lines are determined according to the counting result and are respectively used as reference text lines and text lines to be corrected. For example, if the number of character blocks included in 6 target text lines is 5 and the number of character blocks included in only 1 target text line is 2 in the statistical result, it may be determined that the text line with the correct block of the 6 target text lines is a reference text line, and it may be determined that there is an error in another 1 target text line block, and it may be determined that there is an error in the other 1 target text line block.

All text lines to be corrected and all reference text lines can be obtained through the steps of the method, and on the basis, for any text line to be corrected, the position of the text line to be corrected, which needs to be subjected to secondary blocking processing, is determined according to the coordinate overlapping rate between the text line to be corrected and the reference text line. Specifically, for a certain text line to be corrected, one reference text line is selected from all reference text lines, the coordinate overlapping rate between the text line to be corrected and two corresponding character blocks in the reference text line is calculated, and if the coordinate overlapping rate is smaller than a threshold value of the coordinate overlapping rate, it is indicated that a blocking error exists in the character blocks in the text line to be corrected, so that secondary blocking processing needs to be performed on the character blocks in the text line to be corrected, and finally, the position where secondary blocking processing needs to be performed on the character blocks in the text line to be corrected can be determined according to the right boundary position of the last character of the character blocks in the reference text line. The coordinate overlapping rate threshold may be set according to actual requirements, and is not specifically limited herein.

According to the key information extraction method of the medical examination order, provided by the embodiment of the invention, the positions of all third character blocks needing secondary block processing are determined by using a global reference mode so as to perform secondary block processing on all third character blocks, so that the condition of block errors occurring in the third character blocks can be effectively corrected, and the block accuracy of each third character block is further ensured.

Based on any one of the above embodiments, a method for extracting key information of a medical examination form is provided, where positions of all third character blocks that need to be subjected to secondary block processing are determined in a local reference manner, and specifically: taking each text line containing the third target blocks as a target text line, counting the number of character blocks contained in each target text line, and determining the text line with errors in the blocks in all the target text lines as a text line to be corrected according to the counting result; and for any text line to be corrected, determining the position of the text line to be corrected, which needs to be subjected to secondary blocking processing, according to the coordinate overlapping rate between the text line to be corrected and the adjacent text line of the text line to be corrected.

Specifically, in the embodiment of the present invention, the positions of all third character blocks that need to be subjected to the secondary block division processing are determined in a local reference manner, which is specifically implemented as follows:

each text line containing the third character blocks is taken as a target text line, that is, if one or more third character blocks are contained in a certain text line, the text line is taken as a target text line. On the basis, the number of character blocks contained in each target text line is counted, and text lines with errors in blocks in all the target text lines are determined according to the counting result and serve as text lines to be corrected. For example, if the number of character blocks included in 6 target text lines is 5 and the number of character blocks included in only 1 target text line is 2 in the statistical result, it may be determined that there is an error in the text line in which only 1 target text line is blocked, and the text line is taken as the text line to be corrected.

All text lines to be corrected can be obtained through the steps of the method, and on the basis, the position of the text line to be corrected, which needs to be subjected to secondary block processing, is determined according to the coordinate overlapping rate between the text line to be corrected and the adjacent text line. Specifically, for a certain text line to be corrected, a coordinate overlapping rate between the text line to be corrected and two corresponding character blocks in an adjacent text line of the text line to be corrected is calculated, and if the coordinate overlapping rate is smaller than a coordinate overlapping rate threshold value, it is described that a blocking error exists in the character blocks in the text line to be corrected, so that secondary blocking processing needs to be performed on the character blocks in the text line to be corrected, and finally, a position where secondary blocking processing needs to be performed on the character blocks in the text line to be corrected can be determined according to a right boundary position of a last character of the character blocks in the adjacent text line. The coordinate overlapping rate threshold may be set according to actual requirements, and is not specifically limited herein.

According to the key information extraction method of the medical examination order, provided by the embodiment of the invention, the positions of all third character blocks needing secondary block processing are determined by using a local reference mode so as to perform secondary block processing on all third character blocks, so that the condition of block errors in the third character blocks can be effectively corrected, and the block accuracy of each third character block is further ensured.

Based on any one of the embodiments, a method for extracting key information of a medical examination form is provided, in which all characters are segmented into a plurality of text lines according to left boundary coordinates and right boundary coordinates corresponding to all the characters by using a preset line segmentation rule, specifically: and traversing each character in sequence, and for any one current character, if the right boundary coordinate of the current character is larger than the left boundary coordinate of the next character of the current character, and the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is larger than a second threshold value, performing line-dividing processing on the current character and the next character of the current character so as to divide all characters into a plurality of text lines.

Specifically, it should be noted that, in the same text line, the left boundary coordinate and the right boundary coordinate of each character are sequentially incremented, so that it can be determined that two adjacent characters are in different text lines if the left boundary coordinate and the right boundary coordinate of the two adjacent characters are decremented. In view of this, in the embodiment of the present invention, after the characters and the left boundary coordinate and the right boundary coordinate of each character in the target check list are sequentially recognized, each character is sequentially traversed, and for any current character, if the right boundary coordinate of the current character is greater than the left boundary coordinate of the next character of the current character, and a coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is greater than a second threshold, it is determined that the current character and the next character of the current character belong to different text lines, so that the current character and the next character of the current character need to be subjected to line division processing. The second threshold may be set according to actual requirements, and is not specifically limited herein. Through the steps of the method, all characters in the target check list can be divided into a plurality of text lines.

According to the method for extracting the key information of the medical examination list, provided by the embodiment of the invention, all characters in the target examination list are segmented into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to all the characters by using the preset line segmentation rule, so that all character blocks in the target examination list can be obtained according to the text line segmentation, and further, the key information in the target examination list can be extracted from the character blocks.

Based on any of the above embodiments, a method for extracting key information of a medical examination form is provided, where all characters in a text line are segmented into a plurality of character blocks according to left and right boundary coordinates corresponding to all characters in the text line by using a preset segmentation rule, and the method specifically includes: and traversing each character in the text line in sequence, and for any current character, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is greater than a third threshold value, performing block processing on the current character and the next character of the current character so as to divide all characters in the text line into a plurality of character blocks.

Specifically, it should be noted that, in the same character block, the distance between two adjacent characters is small, and therefore, in the same text line, if the distance between two adjacent characters is large, it can be determined that the two adjacent characters belong to different character blocks. In view of this, in the embodiment of the present invention, after all characters in the target check list are divided into a plurality of text lines, for any one text line, each character in the text line is sequentially traversed, and for any one current character, if a coordinate difference between a right boundary coordinate of the current character and a left boundary coordinate of a next character of the current character is greater than a third threshold, it is determined that the current character and the next character of the current character belong to different text blocks, so that the current character and the next character of the current character need to be subjected to block processing. The third threshold may be set according to actual requirements, and is not specifically limited herein. Through the steps of the method, each text line in the target check list can be divided into a plurality of character blocks.

According to the method for extracting the key information of the medical examination list, provided by the embodiment of the invention, for any text line, all characters in the text line are divided into a plurality of character blocks according to the left boundary coordinate and the right boundary coordinate corresponding to all the characters in the text line by using the preset block dividing rule, so that each text line in the target examination list is divided into a plurality of character blocks, and the key information in the target examination list is favorably extracted from the character blocks.

Fig. 2 is a schematic structural diagram of a key information extraction device for a medical checklist according to an embodiment of the present invention, and as shown in fig. 2, the device includes: a character line-dividing module 21, a character block module 22, a first key information extraction module 23, a second key information extraction module 24, and a third key information extraction module 25, wherein:

the character line dividing module 21 is configured to sequentially identify each character in the target check list and left and right boundary coordinates corresponding to each character, and divide all the characters into a plurality of text lines according to the left and right boundary coordinates corresponding to all the characters by using a preset line dividing rule, where each text line includes a plurality of characters.

Specifically, the character dividing module 21 sequentially identifies each character in the target check list according to the sequence of the text in the target check list from top to bottom and from left to right by using a character identification technology, and obtains a left boundary coordinate and a right boundary coordinate corresponding to each character. The character recognition technology may be an Optical Character Recognition (OCR) technology, or other character recognition technologies, and may be set according to actual requirements, which is not specifically limited herein.

Further, the character line dividing module 21 divides all the characters into a plurality of text lines according to the left boundary coordinates and the right boundary coordinates corresponding to all the characters respectively by using a preset line dividing rule, wherein each text line includes a plurality of characters. It can be understood that, in the same text line, the left boundary coordinate and the right boundary coordinate of each character are sequentially increased, and thus, if the left boundary coordinate and the right boundary coordinate of the two sequentially recognized adjacent characters are decreased, it can be determined that the two adjacent characters are in different text lines, that is, a line division process needs to be performed between the two adjacent characters. In the embodiment of the invention, the preset line division rule is preset based on the principle, so that all characters in the target inspection list are divided into a plurality of text lines by using the preset line division rule.

The character blocking module 22 is configured to, for any one text line, divide all characters in the text line into a plurality of character blocks according to left boundary coordinates and right boundary coordinates corresponding to all characters in the text line by using a preset blocking rule, where each character block includes at least one character.

Specifically, after all the characters in the target check list are divided into a plurality of text lines, for any one text line, the character blocking module 22 divides all the characters in the text line into a plurality of character blocks according to the left and right boundary coordinates corresponding to all the characters in the text line by using the preset blocking rule. Thus, all characters in each line of text may be segmented into a plurality of character blocks. Wherein each character block contains at least one character. It is understood that, in the same character block, the distance between two adjacent characters is smaller, and therefore, in the same text line, if the distance between two adjacent characters is larger, it can be determined that the two adjacent characters belong to different character blocks. In the embodiment of the present invention, the distance between two adjacent characters can be represented as a coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character. Therefore, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character is large, it can be determined that the current character and the next character belong to different character blocks. That is, the blocking process needs to be performed between the current character and its next character. In the embodiment of the present invention, a preset blocking rule is preset based on the above principle, so that all characters in each text line are divided into a plurality of character blocks by using the preset blocking rule.

The first key information extraction module 23 is configured to screen out character blocks including first key information from all the character blocks in a keyword matching manner, to serve as first character blocks, and extract the first key information from the first character blocks, where the first key information is a hospital name.

Specifically, through the above method steps, all characters in the target checklist are segmented into a plurality of text lines, and each text line is segmented into a plurality of text blocks. On the basis, the first key information extraction module 23 screens out character blocks containing the first key information from all the character blocks in a keyword matching manner to serve as first character blocks. Wherein, the first key information is the name of the hospital. That is, all the character blocks in the target check list are traversed, and if two characters of "hospital" are included in a certain character block, it is indicated that the hospital name is included in the character block, and the character block is taken as the first character block. After the first character block is screened out, first key information is extracted from the first character block, namely a hospital name is extracted from the first character block.

The second key information extraction module 24 is configured to screen out character blocks containing second key information from all the character blocks in a keyword matching manner, and extract the second key information from all the second character blocks, where the second key information includes a patient name, a patient gender, a patient age, an examining doctor, and an examining time.

Specifically, the second key information extraction module 24 screens out the character blocks containing the second key information from all the character blocks in a keyword matching manner, and uses the character blocks as the second character blocks. The second key information includes the name of the patient, the sex of the patient, the age of the patient, the doctor and the time of examination, etc. That is, when all the character blocks in the target checklist are traversed and a certain character block includes characters such as "name" or "gender" or "age" or "doctor for examination" or "examination time", the character block is set as the second character block. After the second character block is screened out, second key information is extracted from the second character block, namely information such as the name of the patient, the sex of the patient, the age of the patient, the doctor for examination, the time of examination and the like is extracted from the second character block.

The third key information extraction module 25 is configured to extract, from all third character blocks, third key information including the inspection item, the result, the unit, and the reference range, by using each of the other character blocks except the first character block and the second character block as the third character block.

Specifically, on the basis of the above technical solution, the third key information extraction module 25 takes each of the other character blocks in the target checklist, except for the first character block and the second character block, as a third character block. On this basis, the third key information extraction module 25 extracts the third key information from all the third character blocks, wherein the third key information includes the inspection item, the result, the unit, the reference range, and the like. It should be noted that, in the medical examination list, the examination items are generally represented by chinese characters; results are generally expressed as a single numerical value; units are typically represented by english characters; reference ranges are generally expressed as numerical ranges. In view of this, in the embodiment of the present invention, the third key information extraction module 25 extracts all the check items in the target check list and the result, unit and reference range corresponding to each check item from all the third character blocks according to the representation forms corresponding to each check item, result, unit and reference range.

The device for extracting key information of a medical examination order provided by the embodiment of the present invention specifically executes the processes of the above method embodiments, and for details, the contents of the above method embodiments are referred to, and are not described herein again.

The key information extraction device for the medical examination form provided by the embodiment of the invention sequentially identifies each character in the target examination form and the left boundary coordinate and the right boundary coordinate corresponding to each character, so that all characters in the target examination form are divided into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to each character, each text line is divided into a plurality of character blocks, and finally, the key information in the target examination form is extracted from all the character blocks in a keyword matching mode. The device can accurately extract the key information in the medical examination order, overcomes the problem that the data in the medical examination order is difficult to accurately extract in the prior art, and is favorable for storing the key information in the medical examination order of the patient in a warehouse so as to effectively share and circulate the health information of the patient.

Fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. Referring to fig. 3, the electronic device includes: a processor (processor)31, a memory (memory)32, and a bus 33; wherein, the processor 31 and the memory 32 complete the communication with each other through the bus 33; the processor 31 is configured to call program instructions in the memory 32 to perform the methods provided by the above-mentioned method embodiments, for example, including: sequentially identifying each character in the target inspection list and the left boundary coordinate and the right boundary coordinate corresponding to each character, and dividing all the characters into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to all the characters by using a preset line division rule, wherein each text line comprises a plurality of characters; for any text line, dividing all characters in the text line into a plurality of character blocks by using a preset block dividing rule according to left boundary coordinates and right boundary coordinates corresponding to all characters in the text line, wherein each character block comprises at least one character; screening character blocks containing first key information from all the character blocks in a keyword matching mode to serve as first character blocks, and extracting the first key information from the first character blocks, wherein the first key information is a hospital name; screening out character blocks containing second key information from all the character blocks in a keyword matching mode to serve as second character blocks, and extracting the second key information from all the second character blocks, wherein the second key information comprises the name of a patient, the sex of the patient, the age of the patient, a doctor for examination and examination time; and taking each character block except the first character block and the second character block as a third character block, and extracting third key information from all the third character blocks, wherein the third key information comprises the checking item, the result, the unit and the reference range.

Furthermore, the logic instructions in the memory 32 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: sequentially identifying each character in the target inspection list and the left boundary coordinate and the right boundary coordinate corresponding to each character, and dividing all the characters into a plurality of text lines according to the left boundary coordinate and the right boundary coordinate corresponding to all the characters by using a preset line division rule, wherein each text line comprises a plurality of characters; for any text line, dividing all characters in the text line into a plurality of character blocks by using a preset block dividing rule according to left boundary coordinates and right boundary coordinates corresponding to all characters in the text line, wherein each character block comprises at least one character; screening character blocks containing first key information from all the character blocks in a keyword matching mode to serve as first character blocks, and extracting the first key information from the first character blocks, wherein the first key information is a hospital name; screening out character blocks containing second key information from all the character blocks in a keyword matching mode to serve as second character blocks, and extracting the second key information from all the second character blocks, wherein the second key information comprises the name of a patient, the sex of the patient, the age of the patient, a doctor for examination and examination time; and taking each character block except the first character block and the second character block as a third character block, and extracting third key information from all the third character blocks, wherein the third key information comprises the checking item, the result, the unit and the reference range.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A key information extraction method of a medical examination order is characterized by comprising the following steps:

taking each of the other character blocks except the first character block and the second character block as a third character block, and extracting third key information from all the third character blocks, wherein the third key information comprises an inspection item, a result, a unit and a reference range;

taking each of the other character blocks except the first character block and the second character block as a third character block, and then further comprising:

determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a distance reference mode, and marking the positions as first marking positions;

determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a global reference mode, and marking the positions as second marking positions;

determining positions, which need to be subjected to secondary blocking processing, in all third character blocks by using a local reference mode, and marking the positions as third marking positions;

performing secondary blocking processing on all third character blocks according to all first mark positions, all second mark positions and all third mark positions;

correspondingly, the third key information is extracted from all third character blocks, specifically:

extracting the third key information from all third character blocks subjected to secondary blocking processing;

the determining, by using a global reference mode, the positions of all third character blocks where secondary blocking processing is required is specifically:

taking each text line containing the third character block as a target text line, counting the number of character blocks contained in each target text line, determining correct text lines in all the target text lines and text lines with wrong blocks according to the counting result, and respectively taking the text lines as reference text lines and text lines to be corrected;

and for any text line to be corrected, determining the position of the text line to be corrected, which needs to be subjected to secondary block processing, according to the coordinate overlapping rate between the text line to be corrected and the reference text line.

2. The method for extracting key information of a medical examination form according to claim 1, wherein the positions of all third character blocks requiring secondary block processing are determined by using a distance reference mode, specifically:

calculating the mean value of the distances between every two adjacent character blocks in all the character blocks contained in the target inspection list, and determining a first threshold value according to the mean value;

and for any one third character block, sequentially traversing each character in the third character block, and for any current character, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is greater than the first threshold, determining the position between the current character and the next character of the current character as the position needing secondary block processing.

3. The method for extracting key information of a medical examination form according to claim 1, wherein the positions of all third character blocks requiring secondary block processing are determined by using a local reference mode, specifically:

taking each text line containing the third character block as a target text line, counting the number of character blocks contained in each target text line, and determining the text line with errors in the blocks in all the target text lines according to the counting result to be used as the text line to be corrected;

and for any text line to be corrected, determining the position of the text line to be corrected, which needs to be subjected to secondary block processing, according to the coordinate overlapping rate between the text line to be corrected and the adjacent text line of the text line to be corrected.

4. The method for extracting key information of a medical examination form according to claim 1, wherein all characters are segmented into a plurality of text lines according to left boundary coordinates and right boundary coordinates corresponding to all characters respectively by using a preset line segmentation rule, specifically:

and traversing each character in sequence, and for any current character, if the right boundary coordinate of the current character is larger than the left boundary coordinate of the next character of the current character, and the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is larger than a second threshold value, performing line division processing on the current character and the next character of the current character so as to divide all characters into a plurality of text lines.

5. The method for extracting key information of a medical examination form according to claim 1, wherein all characters in the text line are divided into a plurality of character blocks according to left and right boundary coordinates corresponding to all characters in the text line by using a preset block division rule, specifically:

and traversing each character in the text line in sequence, and for any current character, if the coordinate difference between the right boundary coordinate of the current character and the left boundary coordinate of the next character of the current character is greater than a third threshold value, performing block processing on the current character and the next character of the current character so as to divide all characters in the text line into a plurality of character blocks.

6. A key information extraction device of a medical examination order is characterized by comprising:

a third key information extraction module, configured to extract third key information from all third character blocks by using each of the other character blocks except the first character block and the second character block as a third character block, where the third key information includes an inspection item, a result, a unit, and a reference range;

7. An electronic device comprising at least one processor and at least one memory communicatively coupled to the processor, the memory storing program instructions executable by the processor, wherein the processor is capable of executing the method of any of claims 1 to 5 when invoked by the program instructions.

8. A non-transitory computer readable storage medium storing computer instructions, the computer instructions causing the computer to perform the method of any of claims 1 to 5.