CN108197119A - The archives of paper quality digitizing solution of knowledge based collection of illustrative plates - Google Patents

The archives of paper quality digitizing solution of knowledge based collection of illustrative plates Download PDF

Info

Publication number
CN108197119A
CN108197119A CN201810111488.3A CN201810111488A CN108197119A CN 108197119 A CN108197119 A CN 108197119A CN 201810111488 A CN201810111488 A CN 201810111488A CN 108197119 A CN108197119 A CN 108197119A
Authority
CN
China
Prior art keywords
archives
paper quality
data
knowledge
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810111488.3A
Other languages
Chinese (zh)
Inventor
李进荣
孙懿鑫
张步明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhuo Guan Information Technology Co Ltd
Original Assignee
Chengdu Zhuo Guan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhuo Guan Information Technology Co Ltd filed Critical Chengdu Zhuo Guan Information Technology Co Ltd
Priority to CN201810111488.3A priority Critical patent/CN108197119A/en
Publication of CN108197119A publication Critical patent/CN108197119A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of archives of paper quality digitizing solutions of knowledge based collection of illustrative plates.It includes obtaining archives of paper quality pictorial information, analysis obtains standardized text data, extract the entity information of critical entities, entity information is carried out data fusion by structure normal dictionary table, form structural data, knowledge mapping is built using structural data as knowledge entry, archives of paper quality content-data is obtained according to knowledge mapping and generates electronic document.The present invention improves the digitized working efficiency of archives of paper quality, while reduces accidentally behaviour and lead.

Description

The archives of paper quality digitizing solution of knowledge based collection of illustrative plates
Technical field
The invention belongs to electronic information technical field more particularly to a kind of archives of paper quality digitlization sides of knowledge based collection of illustrative plates Method.
Background technology
Archives of paper quality digitlization operation is that archives large database concept builds most basic work, and operating process includes archives Taxonomic revision, image scanning, words input and arrangement storage and etc..The digitized presentation of archives of paper quality at present is by reality Object archives of paper quality, the archives for becoming electronic document (forms such as JPG, PDF or TFF) are stored, and the purpose is to be information-based clothes Business, it is therefore necessary to can be read and be used by related software system.
Thus when establishing electronic record database, for each archives of paper quality, it is necessary to generate two electronic documents:One A is the picture of the archives of paper quality, and another two are and the one-to-one information of the picture.Current solution is to be fabricated to electricity Sub-pictures add EXCEL entries.Such as 1 archives of paper quality in kind, after scanned, the entitled " 031-053-01-019- of picture is generated The electronic pictures of 01.jpg, but only cannot fully understand that its all the elements is believed substantially from " 031-053-01-019-01.GIF " Breath, therefore, it is necessary to by the information covered on this archives of paper quality, (such as file number, the time, archives kind, page name, is filled and presented at class-mark Which kind of unit, department belong to, have the contents such as several pages) it is input in the corresponding entry of EXCEL file.It can be seen that complete The digitlization of a piece of paper matter archives needs to do two things:When scanning archives of paper quality, second is that inputting archive content to EXCEL file Correspondence item day in the Room, workload is very huge.
Although common scanner (high photographing instrument) can do some processing to the picture of scanning on the market at present, generally lack Crawl to content information is simultaneously generated to the correspondence item day of EXCEL file in the Room.Certainly with technological progress, also occur carrying The high-grade scanner of optical character identification (Optical Character Recognition, abbreviation OCR), but mistake so far Behaviour, which leads, cannot meet the requirement for being less than 0.5% as defined in National archives digitlization:Even if using the high-grade scanner of import, although Accidentally behaviour, which leads, can reduce several orders of magnitude, but cannot meet the requirements, and the high-grade scanner of such import is expensive, Easily hundreds of thousands even one up to a million, cost is excessively high.So upper general company's archival digitalization work of society so far Program, or being all operation before and after two people of same people's secondary operation or assembly line, working procedure is complicated, causes efficiency low Under, and personnel cost is excessively high.
Invention content
The present invention goal of the invention be:In order to solve in the prior art, archives of paper quality digitization procedure is complicated, leads to efficiency The problems such as low, the present invention propose a kind of archives of paper quality digitizing solution of knowledge based collection of illustrative plates.
The technical scheme is that:A kind of archives of paper quality digitizing solution of knowledge based collection of illustrative plates, including
A, the archives of paper quality pictorial information for needing to be digitized is obtained;
B, morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtains standardized text Data;
C, the entity information of critical entities is extracted from the standardized text data of step B;
D, normal dictionary table is built, entity information in step C is carried out by data fusion according to normal dictionary table, forms structure Change data;
E, knowledge mapping is built as knowledge entry according to structural data in step D;
F, the content-data in archives of paper quality pictorial information is obtained according to knowledge mapping in step E and generates electronic document.
Further, the step B carries out archives of paper quality pictorial information in step A morphology, grammer and/or semanteme point Analysis, obtaining standardized text data is specially:
Document knot is carried out to the paragraph of archives of paper quality pictorial information in step A using paragraph sorter model trained in advance Structure is classified, and paragraph structure division is carried out to the archives of paper quality pictorial information according to classification results;
If the archives of paper quality pictorial information is Chinese resource, each paragraph structure marked off is segmented, part of speech Mark and phrase chunking, and remove the punctuation mark in paragraph structure;
If the archives of paper quality pictorial information is foreign language resource, each paragraph structure for marking off is carried out stem processing, Lemmatization and phrase chunking, and remove the punctuation mark in paragraph structure.
Further, the step C extracted from the standardized text data of step B critical entities entity information it is specific For:
Classified using noun classification device model trained in advance to the word in the standardized text data, according to Classification results identify and extract the relationship between noun of all categories and each noun.
Further, normal dictionary table is built in the step D is specially:
The architecture of knowledge mapping is established according to conventional data standard;
The entity attribute of critical entities in step C is converted into triple data;
The relationship type and naming rule of the entity attribute and the critical entities are united according to triple data One specification obtains the normal dictionary table with standard criterion.
Further, entity information in step C is carried out by data fusion according to normal dictionary table in the step D, is formed Structural data is specially:
The critical entities are carried out compareing mapping, while retain the key with the content in the normal dictionary table built Entity attributes relationship forms structural data.
The beneficial effects of the invention are as follows:The present invention is standardized by obtaining archives of paper quality pictorial information and being handled Text data, then the entity information of critical entities is extracted, entity information is subjected to data fusion, shape by building normal dictionary table Into structural data, knowledge mapping is built by the use of structural data as knowledge entry, archives of paper quality is obtained according to knowledge mapping Content improves the digitized working efficiency of archives of paper quality, while reduces accidentally behaviour and lead.
Description of the drawings
Fig. 1 is the flow diagram of the archives of paper quality digitizing solution of the knowledge based collection of illustrative plates of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.
As shown in Figure 1, the flow diagram of the archives of paper quality digitizing solution for knowledge based collection of illustrative plates of the invention.It is a kind of The archives of paper quality digitizing solution of knowledge based collection of illustrative plates, including
A, the archives of paper quality pictorial information for needing to be digitized is obtained.
In the present embodiment, it would be desirable to which the archives of paper quality being digitized is scanned by scanner, to obtain the papery Picture after archives scan.
B, morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtains standardized text Data.
In the present embodiment, morphology, grammer and/or semantic analysis refer to word-based to the urtext data of designated field Method, grammer and/or semantic analysis carry out the operations such as structuring processing and word segmentation processing.
C, the entity information of critical entities is extracted from the standardized text data of step B.
In the present embodiment, entity refers to name entity word and event name etc.;Attribute refers to name the noun of entity modification, such as Age, gender, character relation etc..Wherein, the relationship of entity attribute is shared mainly by the probability for calculating co-occurrence, extraction entity, The attribute word of maximum probability.Relationship between entity, on the one hand according to the co-occurrence probabilities in sentence, on the other hand according to identification The entity attribute relationship extraction entity relationship gone out.
D, normal dictionary table is built, entity information in step C is carried out by data fusion according to normal dictionary table, forms structure Change data;
E, knowledge mapping is built as knowledge entry according to structural data in step D;
F, the content-data in archives of paper quality pictorial information is obtained according to knowledge mapping in step E and generates electronic document.
In an alternate embodiment of the present invention where, the step B in above-described embodiment further comprises:
Document knot is carried out to the paragraph of archives of paper quality pictorial information in step A using paragraph sorter model trained in advance Structure is classified, and paragraph structure division is carried out to the archives of paper quality pictorial information according to classification results;
In order to quickly and accurately realize that the paragraph structure of urtext data divides, in the embodiment of the present invention, by will be former Beginning text data carries out structuring, distinguishes the paragraphs such as title, text, author, time, classification, realizes urtext data Paragraph structure divides.Specifically.Specifically, can according to file structure distribution characteristics, such as:The position of text, length, in word Hold etc. feature determines the file structure of the urtext data.Or a little training corpus is manually marked, according to above-mentioned spy Sign structure paragraph sorter model classifies to paragraph, using prediction result of classifying as paragraph properties.
If the archives of paper quality pictorial information is Chinese resource, each paragraph structure marked off is segmented, part of speech Mark and phrase chunking, and remove the punctuation mark in paragraph structure;
If the archives of paper quality pictorial information is foreign language resource, each paragraph structure for marking off is carried out stem processing, Lemmatization and phrase chunking, and remove the punctuation mark in paragraph structure.
In order to quickly and accurately realize that the paragraph structure of urtext data divides, the embodiment of the present invention is former by judging If urtext data are Chinese resource, Chinese word segmentation, part of speech mark are carried out to Chinese resource for the language of beginning text data Note, phrase chunking etc..Specifically available Open-Source Tools carry out morphology, grammer and/or semantic analysis to Chinese.If the textual data During according to for foreign language resource, morphology, grammer and/or semantic analysis are carried out to Chinese resource according to corresponding language tool, for example, to English Language resource carries out stem processing, lemmatization, phrase chunking etc., refers to removal tense, word suffix and is reduced into former word.It is specific Morphology, grammer and/or semantic analysis can be carried out to English resources with Open-Source Tools.
In an alternate embodiment of the present invention where, the step C in above-described embodiment further comprises:
Classified using noun classification device model trained in advance to the word in the standardized text data, according to Classification results identify and extract the relationship between noun of all categories and each noun.Specifically, the relationship between noun can root It is determined according to the co-occurrence probabilities in sentence.
In order to quickly and accurately realize the Knowledge Extraction of standardized text data, the embodiment of the present invention, by existing number According to observation, beginning word to noun terminates the structure feature that the features such as word, word length determines noun of all categories, and according to The structure feature of noun of all categories extracted from standardized text data respective classes noun and each noun between pass System, and then obtain entity information.
In an alternate embodiment of the present invention where, the step D in above-described embodiment further comprises:
The architecture of knowledge mapping is established according to conventional data standard;
The entity attribute of critical entities in step C is converted into triple data;
The relationship type and naming rule of the entity attribute and the critical entities are united according to triple data One specification obtains the normal dictionary table with standard criterion;
The critical entities are carried out compareing mapping, while retain the key with the content in the normal dictionary table built Entity attributes relationship forms structural data, specially:
Judge whether entity information complies with standard specification;If so, entity information is carried out by data according to normal dictionary table Fusion, i.e., map entity name and the content in normal dictionary table, obtain identical entity name and identical physical name The attribute information of title forms structural data;If it is not, then classified according to professional knowledge carries out relationship map, shape to entity information Into structural data;Here entity information includes entity name and entity attribute information, using entity name as index, with standard Content in dictionary table is mapped, and the attribute information of identical entity name and identical entity name is obtained, according to standard word In allusion quotation table entity name and entity between relationship unified standard, by the attribute information of entity name and the attribute of identical entity name Information is fused together.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill can make according to these technical inspirations disclosed by the invention various does not depart from the other each of essence of the invention The specific deformation of kind and combination, these deform and combine still within the scope of the present invention.

Claims (5)

1. a kind of archives of paper quality digitizing solution of knowledge based collection of illustrative plates, which is characterized in that including
A, the archives of paper quality pictorial information for needing to be digitized is obtained;
B, morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtains standardized text data;
C, the entity information of critical entities is extracted from the standardized text data of step B;
D, normal dictionary table is built, entity information in step C is carried out by data fusion according to normal dictionary table, forms structuring number According to;
E, knowledge mapping is built as knowledge entry according to structural data in step D;
F, the content-data in archives of paper quality pictorial information is obtained according to knowledge mapping in step E and generates electronic document.
2. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as described in claim 1, which is characterized in that the step B Morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtaining standardized text data is specially:
File structure point is carried out to the paragraph of archives of paper quality pictorial information in step A using paragraph sorter model trained in advance Class carries out paragraph structure division according to classification results to the archives of paper quality pictorial information;
If the archives of paper quality pictorial information is Chinese resource, each paragraph structure marked off is segmented, part-of-speech tagging And phrase chunking, and remove the punctuation mark in paragraph structure;
If the archives of paper quality pictorial information is foreign language resource, stem processing, morphology are carried out to each paragraph structure marked off Reduction and phrase chunking, and remove the punctuation mark in paragraph structure.
3. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as claimed in claim 2, which is characterized in that the step C The entity information of extraction critical entities is specially from the standardized text data of step B:
Classified using noun classification device model trained in advance to the word in the standardized text data, according to classification As a result it identifies and extracts the relationship between noun of all categories and each noun.
4. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as claimed in claim 3, which is characterized in that the step D It is middle structure normal dictionary table be specially:
The architecture of knowledge mapping is established according to conventional data standard;
The entity attribute of critical entities in step C is converted into triple data;
The relationship type and naming rule of the entity attribute and the critical entities are subjected to unified rule according to triple data Model obtains the normal dictionary table with standard criterion.
5. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as claimed in claim 4, which is characterized in that the step D Middle that entity information in step C is carried out data fusion according to normal dictionary table, forming structural data is specially:
The critical entities are carried out compareing mapping, while retain the critical entities with the content in the normal dictionary table built Relation on attributes, formed structural data.
CN201810111488.3A 2018-02-05 2018-02-05 The archives of paper quality digitizing solution of knowledge based collection of illustrative plates Pending CN108197119A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810111488.3A CN108197119A (en) 2018-02-05 2018-02-05 The archives of paper quality digitizing solution of knowledge based collection of illustrative plates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810111488.3A CN108197119A (en) 2018-02-05 2018-02-05 The archives of paper quality digitizing solution of knowledge based collection of illustrative plates

Publications (1)

Publication Number Publication Date
CN108197119A true CN108197119A (en) 2018-06-22

Family

ID=62592760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810111488.3A Pending CN108197119A (en) 2018-02-05 2018-02-05 The archives of paper quality digitizing solution of knowledge based collection of illustrative plates

Country Status (1)

Country Link
CN (1) CN108197119A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458471A (en) * 2019-08-19 2019-11-15 绍兴数纺科技有限公司 Standardize dyestuff information management system
CN110675121A (en) * 2019-09-23 2020-01-10 珠海市新德汇信息技术有限公司 Method for collecting picture type file material
CN111144123A (en) * 2018-10-16 2020-05-12 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN111737471A (en) * 2020-06-28 2020-10-02 中国农业科学院农业信息研究所 Archive management model construction method and system based on knowledge graph
CN112686262A (en) * 2020-12-28 2021-04-20 广州博士信息技术研究院有限公司 Method for extracting structured data and rapidly archiving handbooks based on image recognition technology
CN116090560A (en) * 2023-04-06 2023-05-09 北京大学深圳研究生院 Knowledge graph establishment method, device and system based on teaching materials
CN116737945A (en) * 2023-05-10 2023-09-12 百洋智能科技集团股份有限公司 Mapping method for EMR knowledge map of patient

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156365A (en) * 2016-08-03 2016-11-23 北京智能管家科技有限公司 A kind of generation method and device of knowledge mapping
CN106529386A (en) * 2016-08-31 2017-03-22 苏州市千尺浪信息科技服务有限公司 Paper archive digitization method and system
CN107491555A (en) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 Knowledge mapping construction method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156365A (en) * 2016-08-03 2016-11-23 北京智能管家科技有限公司 A kind of generation method and device of knowledge mapping
CN106529386A (en) * 2016-08-31 2017-03-22 苏州市千尺浪信息科技服务有限公司 Paper archive digitization method and system
CN107491555A (en) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 Knowledge mapping construction method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
宋淑琴: "大数据视野下档案管理思维方式的转变", 《档案学研究》 *
田萍芳: "《面向云出版的语义关键技术》", 30 April 2015, 武汉大学出版社 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144123A (en) * 2018-10-16 2020-05-12 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN111144123B (en) * 2018-10-16 2024-02-02 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN110458471A (en) * 2019-08-19 2019-11-15 绍兴数纺科技有限公司 Standardize dyestuff information management system
CN110458471B (en) * 2019-08-19 2022-05-20 绍兴数纺科技有限公司 Standardized dye information management system
CN110675121A (en) * 2019-09-23 2020-01-10 珠海市新德汇信息技术有限公司 Method for collecting picture type file material
CN111737471A (en) * 2020-06-28 2020-10-02 中国农业科学院农业信息研究所 Archive management model construction method and system based on knowledge graph
CN111737471B (en) * 2020-06-28 2023-10-13 中国农业科学院农业信息研究所 File management model construction method and system based on knowledge graph
CN112686262A (en) * 2020-12-28 2021-04-20 广州博士信息技术研究院有限公司 Method for extracting structured data and rapidly archiving handbooks based on image recognition technology
CN116090560A (en) * 2023-04-06 2023-05-09 北京大学深圳研究生院 Knowledge graph establishment method, device and system based on teaching materials
CN116090560B (en) * 2023-04-06 2023-08-01 北京大学深圳研究生院 Knowledge graph establishment method, device and system based on teaching materials
CN116737945A (en) * 2023-05-10 2023-09-12 百洋智能科技集团股份有限公司 Mapping method for EMR knowledge map of patient
CN116737945B (en) * 2023-05-10 2024-05-07 百洋智能科技集团股份有限公司 Mapping method for EMR knowledge map of patient

Similar Documents

Publication Publication Date Title
US11501061B2 (en) Extracting structured information from a document containing filled form images
CN108197119A (en) The archives of paper quality digitizing solution of knowledge based collection of illustrative plates
US5748805A (en) Method and apparatus for supplementing significant portions of a document selected without document image decoding with retrieved information
JP3282860B2 (en) Apparatus for processing digital images of text on documents
JP3292388B2 (en) Method and apparatus for summarizing a document without decoding the document image
US6353840B2 (en) User-defined search template for extracting information from documents
CA2661902C (en) Automated classification of document pages
CN106502991B (en) Publication treating method and apparatus
CN113961685A (en) Information extraction method and device
CN110866116A (en) Policy document processing method and device, storage medium and electronic equipment
CN113761377B (en) False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium
Coelho et al. Structured literature image finder: extracting information from text and images in biomedical literature
Wiedemann et al. Page stream segmentation with convolutional neural nets combining textual and visual features
Puri et al. A technical study and analysis of text classification techniques in N-lingual documents
CN112464907A (en) Document processing system and method
CN112036330A (en) Text recognition method, text recognition device and readable storage medium
CN100444194C (en) Automatic extraction device, method and program of essay title and correlation information
Lin et al. Multilingual corpus construction based on printed and handwritten character separation
Naïve et al. Efficient accreditation document classification using naïve bayes classifier
Batomalaque et al. Image to text conversion technique for anti-plagiarism system
CN113065316A (en) Method for dynamically converting formal thumbnail file into html (hypertext markup language) and inputting question bank, selecting questions from question bank and composing draft and generating thumbnail file
JP4334068B2 (en) Keyword extraction method and apparatus for image document
JP4480109B2 (en) Image management apparatus and image management method
Vafaie et al. Improvements in Handwritten and Printed Text Separation in Historical Archival Documents
Gautam et al. The Dataset for Printed Brahmi Word Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180622

RJ01 Rejection of invention patent application after publication