CN108197119A - The archives of paper quality digitizing solution of knowledge based collection of illustrative plates - Google Patents
The archives of paper quality digitizing solution of knowledge based collection of illustrative plates Download PDFInfo
- Publication number
- CN108197119A CN108197119A CN201810111488.3A CN201810111488A CN108197119A CN 108197119 A CN108197119 A CN 108197119A CN 201810111488 A CN201810111488 A CN 201810111488A CN 108197119 A CN108197119 A CN 108197119A
- Authority
- CN
- China
- Prior art keywords
- archives
- paper quality
- data
- knowledge
- carried out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of archives of paper quality digitizing solutions of knowledge based collection of illustrative plates.It includes obtaining archives of paper quality pictorial information, analysis obtains standardized text data, extract the entity information of critical entities, entity information is carried out data fusion by structure normal dictionary table, form structural data, knowledge mapping is built using structural data as knowledge entry, archives of paper quality content-data is obtained according to knowledge mapping and generates electronic document.The present invention improves the digitized working efficiency of archives of paper quality, while reduces accidentally behaviour and lead.
Description
Technical field
The invention belongs to electronic information technical field more particularly to a kind of archives of paper quality digitlization sides of knowledge based collection of illustrative plates
Method.
Background technology
Archives of paper quality digitlization operation is that archives large database concept builds most basic work, and operating process includes archives
Taxonomic revision, image scanning, words input and arrangement storage and etc..The digitized presentation of archives of paper quality at present is by reality
Object archives of paper quality, the archives for becoming electronic document (forms such as JPG, PDF or TFF) are stored, and the purpose is to be information-based clothes
Business, it is therefore necessary to can be read and be used by related software system.
Thus when establishing electronic record database, for each archives of paper quality, it is necessary to generate two electronic documents:One
A is the picture of the archives of paper quality, and another two are and the one-to-one information of the picture.Current solution is to be fabricated to electricity
Sub-pictures add EXCEL entries.Such as 1 archives of paper quality in kind, after scanned, the entitled " 031-053-01-019- of picture is generated
The electronic pictures of 01.jpg, but only cannot fully understand that its all the elements is believed substantially from " 031-053-01-019-01.GIF "
Breath, therefore, it is necessary to by the information covered on this archives of paper quality, (such as file number, the time, archives kind, page name, is filled and presented at class-mark
Which kind of unit, department belong to, have the contents such as several pages) it is input in the corresponding entry of EXCEL file.It can be seen that complete
The digitlization of a piece of paper matter archives needs to do two things:When scanning archives of paper quality, second is that inputting archive content to EXCEL file
Correspondence item day in the Room, workload is very huge.
Although common scanner (high photographing instrument) can do some processing to the picture of scanning on the market at present, generally lack
Crawl to content information is simultaneously generated to the correspondence item day of EXCEL file in the Room.Certainly with technological progress, also occur carrying
The high-grade scanner of optical character identification (Optical Character Recognition, abbreviation OCR), but mistake so far
Behaviour, which leads, cannot meet the requirement for being less than 0.5% as defined in National archives digitlization:Even if using the high-grade scanner of import, although
Accidentally behaviour, which leads, can reduce several orders of magnitude, but cannot meet the requirements, and the high-grade scanner of such import is expensive,
Easily hundreds of thousands even one up to a million, cost is excessively high.So upper general company's archival digitalization work of society so far
Program, or being all operation before and after two people of same people's secondary operation or assembly line, working procedure is complicated, causes efficiency low
Under, and personnel cost is excessively high.
Invention content
The present invention goal of the invention be:In order to solve in the prior art, archives of paper quality digitization procedure is complicated, leads to efficiency
The problems such as low, the present invention propose a kind of archives of paper quality digitizing solution of knowledge based collection of illustrative plates.
The technical scheme is that:A kind of archives of paper quality digitizing solution of knowledge based collection of illustrative plates, including
A, the archives of paper quality pictorial information for needing to be digitized is obtained;
B, morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtains standardized text
Data;
C, the entity information of critical entities is extracted from the standardized text data of step B;
D, normal dictionary table is built, entity information in step C is carried out by data fusion according to normal dictionary table, forms structure
Change data;
E, knowledge mapping is built as knowledge entry according to structural data in step D;
F, the content-data in archives of paper quality pictorial information is obtained according to knowledge mapping in step E and generates electronic document.
Further, the step B carries out archives of paper quality pictorial information in step A morphology, grammer and/or semanteme point
Analysis, obtaining standardized text data is specially:
Document knot is carried out to the paragraph of archives of paper quality pictorial information in step A using paragraph sorter model trained in advance
Structure is classified, and paragraph structure division is carried out to the archives of paper quality pictorial information according to classification results;
If the archives of paper quality pictorial information is Chinese resource, each paragraph structure marked off is segmented, part of speech
Mark and phrase chunking, and remove the punctuation mark in paragraph structure;
If the archives of paper quality pictorial information is foreign language resource, each paragraph structure for marking off is carried out stem processing,
Lemmatization and phrase chunking, and remove the punctuation mark in paragraph structure.
Further, the step C extracted from the standardized text data of step B critical entities entity information it is specific
For:
Classified using noun classification device model trained in advance to the word in the standardized text data, according to
Classification results identify and extract the relationship between noun of all categories and each noun.
Further, normal dictionary table is built in the step D is specially:
The architecture of knowledge mapping is established according to conventional data standard;
The entity attribute of critical entities in step C is converted into triple data;
The relationship type and naming rule of the entity attribute and the critical entities are united according to triple data
One specification obtains the normal dictionary table with standard criterion.
Further, entity information in step C is carried out by data fusion according to normal dictionary table in the step D, is formed
Structural data is specially:
The critical entities are carried out compareing mapping, while retain the key with the content in the normal dictionary table built
Entity attributes relationship forms structural data.
The beneficial effects of the invention are as follows:The present invention is standardized by obtaining archives of paper quality pictorial information and being handled
Text data, then the entity information of critical entities is extracted, entity information is subjected to data fusion, shape by building normal dictionary table
Into structural data, knowledge mapping is built by the use of structural data as knowledge entry, archives of paper quality is obtained according to knowledge mapping
Content improves the digitized working efficiency of archives of paper quality, while reduces accidentally behaviour and lead.
Description of the drawings
Fig. 1 is the flow diagram of the archives of paper quality digitizing solution of the knowledge based collection of illustrative plates of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not
For limiting the present invention.
As shown in Figure 1, the flow diagram of the archives of paper quality digitizing solution for knowledge based collection of illustrative plates of the invention.It is a kind of
The archives of paper quality digitizing solution of knowledge based collection of illustrative plates, including
A, the archives of paper quality pictorial information for needing to be digitized is obtained.
In the present embodiment, it would be desirable to which the archives of paper quality being digitized is scanned by scanner, to obtain the papery
Picture after archives scan.
B, morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtains standardized text
Data.
In the present embodiment, morphology, grammer and/or semantic analysis refer to word-based to the urtext data of designated field
Method, grammer and/or semantic analysis carry out the operations such as structuring processing and word segmentation processing.
C, the entity information of critical entities is extracted from the standardized text data of step B.
In the present embodiment, entity refers to name entity word and event name etc.;Attribute refers to name the noun of entity modification, such as
Age, gender, character relation etc..Wherein, the relationship of entity attribute is shared mainly by the probability for calculating co-occurrence, extraction entity,
The attribute word of maximum probability.Relationship between entity, on the one hand according to the co-occurrence probabilities in sentence, on the other hand according to identification
The entity attribute relationship extraction entity relationship gone out.
D, normal dictionary table is built, entity information in step C is carried out by data fusion according to normal dictionary table, forms structure
Change data;
E, knowledge mapping is built as knowledge entry according to structural data in step D;
F, the content-data in archives of paper quality pictorial information is obtained according to knowledge mapping in step E and generates electronic document.
In an alternate embodiment of the present invention where, the step B in above-described embodiment further comprises:
Document knot is carried out to the paragraph of archives of paper quality pictorial information in step A using paragraph sorter model trained in advance
Structure is classified, and paragraph structure division is carried out to the archives of paper quality pictorial information according to classification results;
In order to quickly and accurately realize that the paragraph structure of urtext data divides, in the embodiment of the present invention, by will be former
Beginning text data carries out structuring, distinguishes the paragraphs such as title, text, author, time, classification, realizes urtext data
Paragraph structure divides.Specifically.Specifically, can according to file structure distribution characteristics, such as:The position of text, length, in word
Hold etc. feature determines the file structure of the urtext data.Or a little training corpus is manually marked, according to above-mentioned spy
Sign structure paragraph sorter model classifies to paragraph, using prediction result of classifying as paragraph properties.
If the archives of paper quality pictorial information is Chinese resource, each paragraph structure marked off is segmented, part of speech
Mark and phrase chunking, and remove the punctuation mark in paragraph structure;
If the archives of paper quality pictorial information is foreign language resource, each paragraph structure for marking off is carried out stem processing,
Lemmatization and phrase chunking, and remove the punctuation mark in paragraph structure.
In order to quickly and accurately realize that the paragraph structure of urtext data divides, the embodiment of the present invention is former by judging
If urtext data are Chinese resource, Chinese word segmentation, part of speech mark are carried out to Chinese resource for the language of beginning text data
Note, phrase chunking etc..Specifically available Open-Source Tools carry out morphology, grammer and/or semantic analysis to Chinese.If the textual data
During according to for foreign language resource, morphology, grammer and/or semantic analysis are carried out to Chinese resource according to corresponding language tool, for example, to English
Language resource carries out stem processing, lemmatization, phrase chunking etc., refers to removal tense, word suffix and is reduced into former word.It is specific
Morphology, grammer and/or semantic analysis can be carried out to English resources with Open-Source Tools.
In an alternate embodiment of the present invention where, the step C in above-described embodiment further comprises:
Classified using noun classification device model trained in advance to the word in the standardized text data, according to
Classification results identify and extract the relationship between noun of all categories and each noun.Specifically, the relationship between noun can root
It is determined according to the co-occurrence probabilities in sentence.
In order to quickly and accurately realize the Knowledge Extraction of standardized text data, the embodiment of the present invention, by existing number
According to observation, beginning word to noun terminates the structure feature that the features such as word, word length determines noun of all categories, and according to
The structure feature of noun of all categories extracted from standardized text data respective classes noun and each noun between pass
System, and then obtain entity information.
In an alternate embodiment of the present invention where, the step D in above-described embodiment further comprises:
The architecture of knowledge mapping is established according to conventional data standard;
The entity attribute of critical entities in step C is converted into triple data;
The relationship type and naming rule of the entity attribute and the critical entities are united according to triple data
One specification obtains the normal dictionary table with standard criterion;
The critical entities are carried out compareing mapping, while retain the key with the content in the normal dictionary table built
Entity attributes relationship forms structural data, specially:
Judge whether entity information complies with standard specification;If so, entity information is carried out by data according to normal dictionary table
Fusion, i.e., map entity name and the content in normal dictionary table, obtain identical entity name and identical physical name
The attribute information of title forms structural data;If it is not, then classified according to professional knowledge carries out relationship map, shape to entity information
Into structural data;Here entity information includes entity name and entity attribute information, using entity name as index, with standard
Content in dictionary table is mapped, and the attribute information of identical entity name and identical entity name is obtained, according to standard word
In allusion quotation table entity name and entity between relationship unified standard, by the attribute information of entity name and the attribute of identical entity name
Information is fused together.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair
Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field
Those of ordinary skill can make according to these technical inspirations disclosed by the invention various does not depart from the other each of essence of the invention
The specific deformation of kind and combination, these deform and combine still within the scope of the present invention.
Claims (5)
1. a kind of archives of paper quality digitizing solution of knowledge based collection of illustrative plates, which is characterized in that including
A, the archives of paper quality pictorial information for needing to be digitized is obtained;
B, morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtains standardized text data;
C, the entity information of critical entities is extracted from the standardized text data of step B;
D, normal dictionary table is built, entity information in step C is carried out by data fusion according to normal dictionary table, forms structuring number
According to;
E, knowledge mapping is built as knowledge entry according to structural data in step D;
F, the content-data in archives of paper quality pictorial information is obtained according to knowledge mapping in step E and generates electronic document.
2. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as described in claim 1, which is characterized in that the step B
Morphology, grammer and/or semantic analysis are carried out to archives of paper quality pictorial information in step A, obtaining standardized text data is specially:
File structure point is carried out to the paragraph of archives of paper quality pictorial information in step A using paragraph sorter model trained in advance
Class carries out paragraph structure division according to classification results to the archives of paper quality pictorial information;
If the archives of paper quality pictorial information is Chinese resource, each paragraph structure marked off is segmented, part-of-speech tagging
And phrase chunking, and remove the punctuation mark in paragraph structure;
If the archives of paper quality pictorial information is foreign language resource, stem processing, morphology are carried out to each paragraph structure marked off
Reduction and phrase chunking, and remove the punctuation mark in paragraph structure.
3. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as claimed in claim 2, which is characterized in that the step C
The entity information of extraction critical entities is specially from the standardized text data of step B:
Classified using noun classification device model trained in advance to the word in the standardized text data, according to classification
As a result it identifies and extracts the relationship between noun of all categories and each noun.
4. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as claimed in claim 3, which is characterized in that the step D
It is middle structure normal dictionary table be specially:
The architecture of knowledge mapping is established according to conventional data standard;
The entity attribute of critical entities in step C is converted into triple data;
The relationship type and naming rule of the entity attribute and the critical entities are subjected to unified rule according to triple data
Model obtains the normal dictionary table with standard criterion.
5. the archives of paper quality digitizing solution of knowledge based collection of illustrative plates as claimed in claim 4, which is characterized in that the step D
Middle that entity information in step C is carried out data fusion according to normal dictionary table, forming structural data is specially:
The critical entities are carried out compareing mapping, while retain the critical entities with the content in the normal dictionary table built
Relation on attributes, formed structural data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810111488.3A CN108197119A (en) | 2018-02-05 | 2018-02-05 | The archives of paper quality digitizing solution of knowledge based collection of illustrative plates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810111488.3A CN108197119A (en) | 2018-02-05 | 2018-02-05 | The archives of paper quality digitizing solution of knowledge based collection of illustrative plates |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108197119A true CN108197119A (en) | 2018-06-22 |
Family
ID=62592760
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810111488.3A Pending CN108197119A (en) | 2018-02-05 | 2018-02-05 | The archives of paper quality digitizing solution of knowledge based collection of illustrative plates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108197119A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458471A (en) * | 2019-08-19 | 2019-11-15 | 绍兴数纺科技有限公司 | Standardize dyestuff information management system |
CN110675121A (en) * | 2019-09-23 | 2020-01-10 | 珠海市新德汇信息技术有限公司 | Method for collecting picture type file material |
CN111144123A (en) * | 2018-10-16 | 2020-05-12 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN111737471A (en) * | 2020-06-28 | 2020-10-02 | 中国农业科学院农业信息研究所 | Archive management model construction method and system based on knowledge graph |
CN112686262A (en) * | 2020-12-28 | 2021-04-20 | 广州博士信息技术研究院有限公司 | Method for extracting structured data and rapidly archiving handbooks based on image recognition technology |
CN116090560A (en) * | 2023-04-06 | 2023-05-09 | 北京大学深圳研究生院 | Knowledge graph establishment method, device and system based on teaching materials |
CN116737945A (en) * | 2023-05-10 | 2023-09-12 | 百洋智能科技集团股份有限公司 | Mapping method for EMR knowledge map of patient |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156365A (en) * | 2016-08-03 | 2016-11-23 | 北京智能管家科技有限公司 | A kind of generation method and device of knowledge mapping |
CN106529386A (en) * | 2016-08-31 | 2017-03-22 | 苏州市千尺浪信息科技服务有限公司 | Paper archive digitization method and system |
CN107491555A (en) * | 2017-09-01 | 2017-12-19 | 北京纽伦智能科技有限公司 | Knowledge mapping construction method and system |
-
2018
- 2018-02-05 CN CN201810111488.3A patent/CN108197119A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156365A (en) * | 2016-08-03 | 2016-11-23 | 北京智能管家科技有限公司 | A kind of generation method and device of knowledge mapping |
CN106529386A (en) * | 2016-08-31 | 2017-03-22 | 苏州市千尺浪信息科技服务有限公司 | Paper archive digitization method and system |
CN107491555A (en) * | 2017-09-01 | 2017-12-19 | 北京纽伦智能科技有限公司 | Knowledge mapping construction method and system |
Non-Patent Citations (2)
Title |
---|
宋淑琴: "大数据视野下档案管理思维方式的转变", 《档案学研究》 * |
田萍芳: "《面向云出版的语义关键技术》", 30 April 2015, 武汉大学出版社 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144123A (en) * | 2018-10-16 | 2020-05-12 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN111144123B (en) * | 2018-10-16 | 2024-02-02 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN110458471A (en) * | 2019-08-19 | 2019-11-15 | 绍兴数纺科技有限公司 | Standardize dyestuff information management system |
CN110458471B (en) * | 2019-08-19 | 2022-05-20 | 绍兴数纺科技有限公司 | Standardized dye information management system |
CN110675121A (en) * | 2019-09-23 | 2020-01-10 | 珠海市新德汇信息技术有限公司 | Method for collecting picture type file material |
CN111737471A (en) * | 2020-06-28 | 2020-10-02 | 中国农业科学院农业信息研究所 | Archive management model construction method and system based on knowledge graph |
CN111737471B (en) * | 2020-06-28 | 2023-10-13 | 中国农业科学院农业信息研究所 | File management model construction method and system based on knowledge graph |
CN112686262A (en) * | 2020-12-28 | 2021-04-20 | 广州博士信息技术研究院有限公司 | Method for extracting structured data and rapidly archiving handbooks based on image recognition technology |
CN116090560A (en) * | 2023-04-06 | 2023-05-09 | 北京大学深圳研究生院 | Knowledge graph establishment method, device and system based on teaching materials |
CN116090560B (en) * | 2023-04-06 | 2023-08-01 | 北京大学深圳研究生院 | Knowledge graph establishment method, device and system based on teaching materials |
CN116737945A (en) * | 2023-05-10 | 2023-09-12 | 百洋智能科技集团股份有限公司 | Mapping method for EMR knowledge map of patient |
CN116737945B (en) * | 2023-05-10 | 2024-05-07 | 百洋智能科技集团股份有限公司 | Mapping method for EMR knowledge map of patient |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11501061B2 (en) | Extracting structured information from a document containing filled form images | |
CN108197119A (en) | The archives of paper quality digitizing solution of knowledge based collection of illustrative plates | |
US5748805A (en) | Method and apparatus for supplementing significant portions of a document selected without document image decoding with retrieved information | |
JP3282860B2 (en) | Apparatus for processing digital images of text on documents | |
JP3292388B2 (en) | Method and apparatus for summarizing a document without decoding the document image | |
US6353840B2 (en) | User-defined search template for extracting information from documents | |
CA2661902C (en) | Automated classification of document pages | |
CN106502991B (en) | Publication treating method and apparatus | |
CN113961685A (en) | Information extraction method and device | |
CN110866116A (en) | Policy document processing method and device, storage medium and electronic equipment | |
CN113761377B (en) | False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium | |
Coelho et al. | Structured literature image finder: extracting information from text and images in biomedical literature | |
Wiedemann et al. | Page stream segmentation with convolutional neural nets combining textual and visual features | |
Puri et al. | A technical study and analysis of text classification techniques in N-lingual documents | |
CN112464907A (en) | Document processing system and method | |
CN112036330A (en) | Text recognition method, text recognition device and readable storage medium | |
CN100444194C (en) | Automatic extraction device, method and program of essay title and correlation information | |
Lin et al. | Multilingual corpus construction based on printed and handwritten character separation | |
Naïve et al. | Efficient accreditation document classification using naïve bayes classifier | |
Batomalaque et al. | Image to text conversion technique for anti-plagiarism system | |
CN113065316A (en) | Method for dynamically converting formal thumbnail file into html (hypertext markup language) and inputting question bank, selecting questions from question bank and composing draft and generating thumbnail file | |
JP4334068B2 (en) | Keyword extraction method and apparatus for image document | |
JP4480109B2 (en) | Image management apparatus and image management method | |
Vafaie et al. | Improvements in Handwritten and Printed Text Separation in Historical Archival Documents | |
Gautam et al. | The Dataset for Printed Brahmi Word Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180622 |
|
RJ01 | Rejection of invention patent application after publication |