CN107562932A - The academic reference of books data in literature acquisition method of Chinese - Google Patents

The academic reference of books data in literature acquisition method of Chinese Download PDF

Info

Publication number
CN107562932A
CN107562932A CN201710841238.0A CN201710841238A CN107562932A CN 107562932 A CN107562932 A CN 107562932A CN 201710841238 A CN201710841238 A CN 201710841238A CN 107562932 A CN107562932 A CN 107562932A
Authority
CN
China
Prior art keywords
bibliography
data
books
chinese
literature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710841238.0A
Other languages
Chinese (zh)
Inventor
程路
刘文君
吕先竞
彭国莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xihua University
Original Assignee
Xihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xihua University filed Critical Xihua University
Priority to CN201710841238.0A priority Critical patent/CN107562932A/en
Publication of CN107562932A publication Critical patent/CN107562932A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to data acquisition and processing (DAP) technology, and it discloses a kind of academic reference of books data in literature acquisition method of Chinese, solves to use efficiency existing for manual entry bibliography data low in conventional art, the problem of easily error.This method may be summarized to be:OCR identifications are carried out after data acquisition is carried out to bibliography pickup area in books, and are proofreaded, then carry out structuring processing again, by the data deposit structured database after processing;Then, term is set based on the bibliography data in structured database, then searches local document databse, and bibliography matching is carried out using Network Document storehouse as auxiliary;It is put into matching the bibliography come in temporary library in case examining;Finally the bibliography passed is put into specification bibliography storehouse.The present invention is applied to the academic reference of books data in literature collection of high-volume Chinese and processing.

Description

The academic reference of books data in literature acquisition method of Chinese
Technical field
The present invention relates to data acquisition and processing (DAP) technology, and in particular to a kind of academic reference of books data in literature collection of Chinese Method.
Background technology
Academic reference of books document, it is to record number on the resource for information about cited in the academic books of author's creation According to these bibliographic datas are typically occurred in the bibliography table at books end or chapters and sections end, may also appear in books footnote sometimes Or (GB/T 7714-2015 at note in text《Information and bibliographic reference Cataloguing rule》).
Academic reference of books document is recorded, the scientific research of scholar and attitude of making a study of subjects is directly represent, reflects the figure The starting point of the academic research of book, depth and broadness, be advantageous to reader's retrieval, obtain the various documents letter relevant with the academy's successes Resource is ceased, is advantageous to the academy's successes content that the person of editing and publishing, research and development management person and reader screen the books, is advantageous to books feelings Report researcher to carry out Bibliometric to academic books, be the key of the academic books reference database of structure Chinese.
At present, the method for the academic reference of books data in literature of collection Chinese is mostly manual entry method, is the defects of this method Waste time and energy, efficiency is lower when in face of extensive books bibliography data inputting, and easily error.
The content of the invention
The technical problems to be solved by the invention are:A kind of academic reference of books data in literature acquisition method of Chinese is proposed, Solve to use efficiency existing for manual entry bibliography data low in conventional art, the problem of easily error.
The technical solution adopted for the present invention to solve the technical problems is:
The academic reference of books data in literature acquisition method of Chinese, comprises the following steps:
A. judge the type of pending books, if e-book, then into step c, then enter if paper book and walk Rapid b;
B. bibliography data are gathered using corresponding measure according to position of the bibliography in books, into step d;
C. OCR identifications are carried out to the bibliography data in books, then proofreaded;
D. bibliography data are stored in unstructured data storehouse;
E. structuring processing is carried out to the bibliography data in unstructured data storehouse, the data after processing is stored in knot In structure database;
F. term is set based on the bibliography data in structured database, then searches local document databse, and with Network Document storehouse carries out bibliography matching as auxiliary;
G. it is put into matching the bibliography come in temporary library in case examining;
H. the bibliography passed is put into specification bibliography storehouse.
Optimize as further, in step b, the position according to bibliography in books is gathered using corresponding measure Bibliography data, are specifically included:
If bibliography is located at books end or chapters and sections end, bibliography position is swept using flat bed scanner Retouch;
If bibliography is located at note or footnote in text, either scans, or artificial use are carried out using wand Recording pen carries out speech recognition after being recorded.
As further optimization, before being scanned using flat bed scanner to bibliography position, by flat bed scanner Resolution ratio be arranged to 600dpi.
As further optimization, for the picture obtained using flat bed scanner scanning, the page number order in corresponding books And preserved according to PDF-A form, and named with unified specification:
First 4 are digital code, behind fetch the autographs of source books, centre " _ " connection, wherein digital code are cataloguing Latter 4 of the source books coding that mechanism is set.
Optimize as further, in step c, when being proofreaded to identification data, the preferential autograph for proofreading bibliography.
Optimize as further, it is described that structuring processing is carried out to bibliography data in step e, specifically include:
Field quantity and arranging situation for the bibliography of every books formulate field format, and then field is carried out Segmentation, in segmentation, script or program is used to be detected to the separator of interfield to distinguish field.
Optimize as further, in step f, when setting term, determined according to bibliography type:
If bibliography is periodical, with the entitled term of bibliography, if bibliography is books, with reference The autograph of document and publisher are term.
As further optimization, specifically included in step f:
F1. term is set based on the bibliography data in structured database, then searches local document databse;
F2. after local document databse is searched by term, if the lookup result returned is 0, into step f3, if looking into It is not 0 to look for result, then into step f4;
F3. data grabber is carried out according to the field of bibliography from Network Document storehouse, if crawl result is 0, adjusted The bibliography field of retrieval, return to step f3, re-starts data grabber;If it is not 0 to capture result, into step f5;
F4. carry out similitude with former bibliography data according to lookup result to compare, similitude is more than certain threshold value Feedback data in literature is as the bibliography matched;If being more than the feedback data in literature of certain threshold value without similitude, Return to step f3;
F5. carry out similitude with former bibliography data according to crawl result to compare, similitude is more than certain threshold value Feedback data in literature is as the bibliography matched.
Optimize as further, in step g, in addition to:
Descending arrangement is carried out according to similarity size to matching the bibliography come;
The forward bibliography come that matches of at most 5 sequences is chosen to be stored in temporary library in case examining.
Optimize as further, in step h, in addition to:If examine not by, adjust the bibliography field of retrieval, Data grabber is carried out from Network Document storehouse again.
The beneficial effects of the invention are as follows:
Handled by the collection of centering literature art reference of books data in literature and automatic identification, bibliography can be improved Efficiency of inputting, adapt to the demand of the academic reference of books data in literature of extensive collection Chinese;In addition, in bibliography Data Matching When, matched using preferential local document library lookup, and using Network Document storehouse as auxiliary matched in a manner of, in raising matching efficiency While so that retrieval more comprehensively, improves matching accuracy rate.
Brief description of the drawings
Fig. 1 is the Chinese academic reference of books data in literature acquisition method flow chart in the embodiment of the present invention.
Embodiment
The present invention is directed to propose a kind of academic reference of books data in literature acquisition method of Chinese, solves to use in conventional art Efficiency existing for manual entry bibliography data is low, the problem of easily error.
Below in conjunction with the accompanying drawings and embodiment the solution of the present invention is further described:
As shown in figure 1, the Chinese academic reference of books data in literature acquisition method in the present embodiment comprises the following steps:
1st, the source of books is determined, current book classification is paper book and the major class of e-book two, for paper book, Then need according to position of the bibliography in books using corresponding measure collection bibliography data, for e-book, by Scan process is not needed in it, then can directly carry out OCR identifications;
Bibliography collection position refers to the position that bibliography occurs in books, and same bibliography possibly be present at Diverse location in books, such as:After book, after chapters and sections, in footnote or in text.The collection cost of the bibliography of diverse location There is obvious difference, in order to which on the basis of ensureing that bibliography accurately gather, the present invention has unified bibliography Acquisition range, the position for gathering bibliography is specified below:
If there is bibliography after book, then bibliography after book can be only handled;
If there is no bibliography after book, but bibliography after chapter be present, then can only handle bibliography after chapter;
If bibliography after book had both been not present, also in the absence of bibliography after chapter, but footnote be present, then can only handle pin Bibliography in note;
If bibliography after bibliography, chapter had both been not present after book, also in the absence of footnote, then can select in processing text Bibliography, or mark bibliography is nothing,
For the bibliography after book and after chapters and sections, it is scanned using flat bed scanner;And it is located at for what is gathered The bibliography of note in footnote or text, because its pickup area is too small, flat bed scanner scanning is not easy to, so using people Work typing mode gathers, such as:Wand is scanned or recorded by picker with recording pen, then carries out the mode of speech recognition Collection;
For after scanner scanning collection book or after chapters and sections by the way of bibliography, due to the new and old journey of books Degree and printing quality influence the accuracy rate of OCR identifications, so prioritizing selection source authority, integrality when selection scans books The source books good, release is new carry out bibliography partial scan.
In order that the height that the picture that scanning obtains reaches the standard of archive and allows recognition accuracy to try one's best, is being scanned Before operation, the resolution ratio of scanner is arranged to 600dpi, the resolution ratio is provided with some following consideration:
First, with the raising of resolution ratio, the sweep speed of scanner can be significantly reduced, and therefore, resolution ratio can not be set It is too high;
Third, the OCR recognition accuracies of the resolution chart less than 600dpi can be relatively low, therefore, resolution ratio can not be set Put too low.
Third, 600dpi resolution ratio has reached the standard achieved, higher resolution ratio can't improve identification Accuracy rate.
Because the bibliography of a books typically has multipage, after scanning books relevant position, in order to handle and look into Look for conveniently, scanned picture is continuously preserved with original order, it is preferred that preserved with PDF-A forms, and ordered with unified specification Name:First 4 are digital code, behind fetch the autographs of source books, with " _ " connection (for example, 0001_ information resource catalogues), its Latter 4 of the source books coding that middle digital code is set for cataloguing mechanism.
2nd, OCR identifications are carried out to the picture that scanning obtains:
OCR (Optical Character Recognize, optical character identification) refers to using computer software, by grid Change the process that character point storehouse information is converted into computer character coding.One OCR identification process is mainly made up of several parts:Image Input, pretreatment;Binaryzation;Noise remove;Slant correction;Printed page analysis;Character segmentation;The character recognition space of a whole page recovers, will be quiet State paper file and pdf document etc. can not edit format be converted into editable form.
3rd, the data of OCR identifications are corrected:
It is possible that mistake in the result of OCR identifications, now in order to ensure the degree of accuracy of bibliography typing, it is necessary to enter Row manual synchronizing.In a bibliography records, autograph is most important field, next to that owner.Wherein, autograph can be made It is used to be proofreaded with the data that crawlers crawl is returned for search field, so preferentially to proofread autograph.Next to that responsibility Person, it is finally other fields.It can find that some characters are all identified as fixed error character in trimming process, now can be with Selection is overall to replace to reduce the time of amendment.It can be found for most of mistakes by syntactic analysis, so proofreading Cheng Zhong, and it is not required to the comparison of the word of a word one.Finally, the bibliography by check and correction is stored in unstructured data storehouse.
4th, structuring is handled:
The extraction and inquiry of data are easy in structuring processing to data, and a bibliography record is by creator, topic Name, contributor, publish the fields such as ground, publisher, time composition.Its bibliography field quantity of different books, puts in order It is different, but languages of the same race and the field quantity of the bibliography of document type of the same race and the base that puts in order in same books Originally it is the same.On the other hand, every books can be directed to the field quantity of bibliography and arranging situation formulates field format, then Field is split.Although field segmentation can manually use decollator " " insertion segmentation, compared to program segmentation, manually Will be many slowly, and artificial segmentation can be with the increase of workload, its accuracy rate split will decline.Moreover, manually After splitting, also examine the problem of exist, therefore, it is suggested that being handled with program or script.With program or script The reason for operation be the decollator of field between bibliography record in this single books be it is certain, can be according to this A little separators distinguish field.Finally by the bibliography deposit structured database of structuring.
5th, term is set based on the bibliography data in structured database, then searches local document databse, and with Network Document storehouse carries out bibliography matching as auxiliary:
Term determine returning result, term condition setting it is more, recall rate is lower, search condition set too Few, precision ratio will be relatively low, just needs to choose suitable field as term in recording from bibliography for this.Work as document category When type is periodical, with entitled term, recall rate and precision ratio rather moderate;When document type is books, with autograph and Publisher is term, recall rate and precision ratio rather moderate.Because the retrieval habit of Chinese is usually to go to retrieve with autograph, institute Autograph field can be handled with corresponding document databse, to improve the hit rate for inscribeing one's name field.Meanwhile in journal article, Journal article of the same name is few, and the general field that need to only inscribe one's name is assured that a journal article, and books of the same name are then more Generally, so when retrieving books, a search field --- publisher is increased, to reduce the quantity of returning result.
Due to dirty data be present, the number of results that Title Searching returns may be 0, now when retrieval result is 0 (or retrieval Before), it is necessary to search corresponding reason and handle:Such as autograph and autograph other information and the connector problem between them, The problems such as keyword mistake in autograph be present, these can have a strong impact on the recall rate of retrieval result.It is crucial when existing in autograph When character error or symbol error, cutting word participle can be carried out to autograph, then by the keyword of acquisition logical relation "AND" Connect coordinate retrieval.Processing more than, when target record is not still in returning result, artificial participation is now just needed to set Term is put, searches target record.
In retrieval, to improve retrieval rate, local document databse is preferentially searched according to term, if lookup result is not 0, Pertinent literature is found, then the document of lookup is offered into progress similitude with original text compares, if lookup result is 0, i.e., does not look into Pertinent literature is found, then in order to improve the comprehensive of retrieval, then data grabber is carried out from Network Document storehouse;
, can be by the field in the corresponding field in return recording and former bibliography in order to further optimize lookup result String matching is carried out, according to matching degree, is successively arranged return recording according to matching degree descending.Fields match is just to determine two Whether individual field value is the syntactical replaceable person for representing same semantic entity, and Chinese string matching is based on fields match, The string matching being typically used for has:Monocase matching process, Methods of Chinese Automatic Segmentation, edit distance approach, semantic acquaintance Degree etc..
Monocase matching is to carry out the single character in a character string with all characters in another character string one by one Compare, and the character number of hit is recorded into calculating similarity.Chinese word automatic word segmentation method is first to use participle technique will Then string segmentation is compared into phrase, carries out Similarity Measure one by one again.This both of which belongs to Jacobi similarity meter Calculate.Editing distance refers to refer to two word strings between, as the minimum edit operation number needed for one changes into another, come by this in terms of Calculate similarity.Semantic phase knowledge and magnanimity are generally used for big text, will not consider here.First three similarity calculating method respectively has excellent Point, the first algorithm complexity are O (nlogn), and what wherein n was represented is the maximum length of two character strings, and computation complexity is most It is low, there is preferable fault-tolerance for short character strings second, the third considers the order of character in character string.In order to improve The accuracy of Similarity Measure, the application distinguish according to monocase matching process, Methods of Chinese Automatic Segmentation, edit distance approach The similarity of three methods is calculated, the similarity of three methods is then integrated, obtains a total similarity.
In addition, if the result captured from Network Document storehouse is 0, i.e., corresponding document is not grabbed, then need adjustment to examine It is cable-styled, data grabber is carried out from Network Document storehouse again, if grabbing corresponding data in literature, compares convection current into similitude Journey.
6th, the bibliography of hit is put into temporary library in case examining;
Set similarity to be more than as hitting for certain threshold value, the return recording of hit is stored in temporary library in case examining Look into, in practical operation, due to arranging using descending, be enough to match corresponding document in the record of preceding 5 hits, behind The possibility of the record matching of hit is more and more lower, therefore, in order to improve examination efficiency, can select at most 5 hit return Return record deposit temporary library.
7th, the bibliography passed is put into specification bibliography storehouse.
In this step, the bibliography passed is the bibliography of the specification matched with former bibliography, by it Be put into specification bibliography storehouse for document analysis, if examine not by, adjust the bibliography field of retrieval, again from Data grabber is carried out in Network Document storehouse.

Claims (10)

1. the academic reference of books data in literature acquisition method of Chinese, it is characterised in that comprise the following steps:
A. judge the type of pending books, if e-book, then into step c, then enter step b if paper book;
B. bibliography data are gathered using corresponding measure according to position of the bibliography in books, into step d;
C. OCR identifications are carried out to the bibliography data in books, then proofreaded;
D. bibliography data are stored in unstructured data storehouse;
E. structuring processing is carried out to the bibliography data in unstructured data storehouse, the data after processing is stored in structuring In database;
F. term is set based on the bibliography data in structured database, then searches local document databse, and with network Document databse carries out bibliography matching as auxiliary;
G. it is put into matching the bibliography come in temporary library in case examining;
H. the bibliography passed is put into specification bibliography storehouse.
2. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 1, it is characterised in that in step b, institute State and bibliography data are gathered using corresponding measure according to position of the bibliography in books, specifically include:
If bibliography is located at books end or chapters and sections end, bibliography position is scanned using flat bed scanner;
If bibliography is located at note or footnote in text, either scans are carried out using wand, or it is artificial using recording Pen carries out speech recognition after being recorded.
3. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 2, it is characterised in that using flat board Before scanner is scanned to bibliography position, the resolution ratio of flat bed scanner is arranged to 600dpi.
4. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 3, it is characterised in that for using flat The picture that plate scanner scanning obtains, correspond to the order of the page number in books and preserved according to PDF-A form, and with unified Specification name:
First 4 are digital code, behind fetch the autographs of source books, centre " _ " connection, wherein digital code are cataloguing mechanism Latter 4 of the source books coding of setting.
5. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 1, it is characterised in that to identifying number During according to being proofreaded, the preferential autograph for proofreading bibliography.
6. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 1, it is characterised in that in step e, institute State and structuring processing is carried out to bibliography data, specifically include:
Field quantity and arranging situation for the bibliography of every books formulate field format, and then field is divided Cut, in segmentation, use script or program to be detected to the separator of interfield to distinguish field.
7. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 1, it is characterised in that in step f, When term is set, determined according to bibliography type:
If bibliography is periodical, with the entitled term of bibliography, if bibliography is books, with bibliography Autograph and publisher be term.
8. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 7, it is characterised in that have in step f Body includes:
F1. term is set based on the bibliography data in structured database, then searches local document databse;
F2. after local document databse is searched by term, if the lookup result returned is 0, into step f3, if searching knot Fruit is not 0, then into step f4;
F3. data grabber is carried out according to the field of bibliography from Network Document storehouse, if crawl result is 0, adjustment retrieval Bibliography field, return to step f3, re-start data grabber;If it is not 0 to capture result, into step f5;
F4. carry out similitude with former bibliography data according to lookup result to compare, similitude is more than to the feedback of certain threshold value Data in literature is as the bibliography matched;If being more than the feedback data in literature of certain threshold value without similitude, return Step f3;
F5. carry out similitude with former bibliography data according to crawl result to compare, similitude is more than to the feedback of certain threshold value Data in literature is as the bibliography matched.
9. the academic reference of books data in literature acquisition method of Chinese as claimed in claim 8, it is characterised in that in step g, also Including:
Descending arrangement is carried out according to similarity size to matching the bibliography come;
The forward bibliography come that matches of at most 5 sequences is chosen to be stored in temporary library in case examining.
10. the Chinese academic reference of books data in literature acquisition method as described in claim 1-9 any one, its feature exist In, in step h, in addition to:If examine not by adjusting the bibliography field of retrieval, entering again from Network Document storehouse Row data grabber.
CN201710841238.0A 2017-09-18 2017-09-18 The academic reference of books data in literature acquisition method of Chinese Pending CN107562932A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710841238.0A CN107562932A (en) 2017-09-18 2017-09-18 The academic reference of books data in literature acquisition method of Chinese

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710841238.0A CN107562932A (en) 2017-09-18 2017-09-18 The academic reference of books data in literature acquisition method of Chinese

Publications (1)

Publication Number Publication Date
CN107562932A true CN107562932A (en) 2018-01-09

Family

ID=60980295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710841238.0A Pending CN107562932A (en) 2017-09-18 2017-09-18 The academic reference of books data in literature acquisition method of Chinese

Country Status (1)

Country Link
CN (1) CN107562932A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325093A (en) * 2018-08-24 2019-02-12 深圳职业技术学院 Bibliography automatic generation method, device and computer-readable storage medium
CN111125381A (en) * 2018-11-01 2020-05-08 北大方正集团有限公司 Identification method, device, equipment and storage medium of key information of reference document

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050021512A1 (en) * 2003-07-23 2005-01-27 Helmut Koenig Automatic indexing of digital image archives for content-based, context-sensitive searching
CN101539904A (en) * 2009-04-21 2009-09-23 武汉大学 Automatic indexing method of quotations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050021512A1 (en) * 2003-07-23 2005-01-27 Helmut Koenig Automatic indexing of digital image archives for content-based, context-sensitive searching
CN101539904A (en) * 2009-04-21 2009-09-23 武汉大学 Automatic indexing method of quotations

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程路: "中文学术著作参考文献的采集和统计分析", 《万方学位论文库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325093A (en) * 2018-08-24 2019-02-12 深圳职业技术学院 Bibliography automatic generation method, device and computer-readable storage medium
CN111125381A (en) * 2018-11-01 2020-05-08 北大方正集团有限公司 Identification method, device, equipment and storage medium of key information of reference document
CN111125381B (en) * 2018-11-01 2023-08-11 新方正控股发展有限责任公司 Method, device, equipment and storage medium for identifying key information of reference

Similar Documents

Publication Publication Date Title
CN102053991B (en) Method and system for multi-language document retrieval
CN106250830B (en) Digital book structured analysis processing method
US6044375A (en) Automatic extraction of metadata using a neural network
JP4944405B2 (en) Phrase-based indexing method in information retrieval system
JP5175005B2 (en) Phrase-based search method in information search system
US6178417B1 (en) Method and means of matching documents based on text genre
JP4944406B2 (en) How to generate document descriptions based on phrases
CN103823824B (en) A kind of method and system that text classification corpus is built automatically by the Internet
JP4577931B2 (en) Document processing system and index information acquisition method
US7783634B2 (en) Device, a program and a system for managing electronic documents
CN112541490A (en) Archive image information structured construction method and device based on deep learning
JP2006048683A (en) Phrase identification method in information retrieval system
CN101093545A (en) Method for carrying out highlighted marking searching words on snapshot pictures of ancient books in document retrieval system for ancient books
CN103778141A (en) Mixed PDF book catalogue automatic extracting algorithm
CN112035723A (en) Resource library determination method and device, storage medium and electronic device
CN116775972A (en) Remote resource arrangement service method and system based on information technology
CN107562932A (en) The academic reference of books data in literature acquisition method of Chinese
Chang et al. An interactive approach to integrating external textual knowledge for multimodal lifelog retrieval
CN105574004B (en) A kind of removing duplicate webpages method and apparatus
Moxley et al. Automatic video annotation through search and mining
CN112597370A (en) Webpage information autonomous collecting and screening system with specified demand range
Yurtsever et al. Figure search by text in large scale digital document collections
CN114238735B (en) Intelligent internet data acquisition method
CN1955979A (en) Automatic extraction device, method and program of essay title and correlation information
Zhang et al. Extracting relational data from HTML repositories

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109

RJ01 Rejection of invention patent application after publication