CN103324668A - Marking system for marking texts on web pages - Google Patents

Marking system for marking texts on web pages Download PDF

Info

Publication number
CN103324668A
CN103324668A CN2013101868407A CN201310186840A CN103324668A CN 103324668 A CN103324668 A CN 103324668A CN 2013101868407 A CN2013101868407 A CN 2013101868407A CN 201310186840 A CN201310186840 A CN 201310186840A CN 103324668 A CN103324668 A CN 103324668A
Authority
CN
China
Prior art keywords
html code
webpage
marking
code snippet
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101868407A
Other languages
Chinese (zh)
Inventor
吴涛军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2013101868407A priority Critical patent/CN103324668A/en
Publication of CN103324668A publication Critical patent/CN103324668A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a marking system for marking texts on web pages. After a user selects and marks text contents, the system analyzes web page structures, a set of maximum character strings without web page labels is extracted out of hypertext markup language (html) codes corresponding to the text contents selected by the user, marking labels are added to each character string in the set to generate an html code snippet set with the marking effect, the text contents in original web pages are replaced respectively, and new web page documents with the marking effect are formed and then displayed through browsers. Compared with the prior art, the marking system has the advantages that the user can perform marking on the web pages containing complicated web page label combinations, the structural stability of the web pages is kept, and functions and patterns of the web pages before marking are not changed after marking is performed by the user.

Description

A kind of on webpage the Mk system of retrtieval
Technical field
The invention belongs to web web technologies field, particularly a kind of on webpage the Mk system of retrtieval.
Background technology
Realization is as follows to the main method that text in the webpage carries out mark in the prior art:
The content of text of user selection is added that markup tags contains the html code snippet of mark effect with formation; The content of text of finding out user selection behind the corresponding html code snippet, directly replaces to the html code snippet that contains the mark effect of generation with it, to form the new web document with the mark effect in former webpage; The web document with the mark effect by the browser display generation.
Because the content of text of user selection might not be continuous in the html code, the centre may comprise a plurality of webpage labels, this simple joining method may allow the html code snippet that contains the mark effect lose efficacy, even may make original structure of web page distortion or collapse.
Summary of the invention
Technical matters to be solved by this invention is for the problem of pointing out in the background technology, provide a kind of on webpage the Mk system of retrtieval, can effectively avoid containing inefficacy and the distortion of former structure of web page and the disappearance of function of the html code snippet of mark effect.
The present invention is for solving the problems of the technologies described above by the following technical solutions:
A kind of on webpage the Mk system of retrtieval, system carries out following action behind the user selection content of text mark:
Step 1) is resolved the dom tree structure that obtains webpage to webpage, obtains the content of text of user selection;
Step 2), the html code that the content of text of user selection is corresponding is analyzed, extract the string assemble of the maximum that does not contain the webpage label in the mode of recurrence, and each character string that extracts is added that markup tags is to generate the html code snippet set with the mark effect;
Step 3) substitutes respectively corresponding content of text in the webpage with the html code snippet that contains the mark effect that generates, and forms the new web document with the mark effect, and passes through browser display.
As the present invention a kind of on webpage the further scheme of the Mk system of retrtieval, step 2) described generation is as follows with the concrete steps of the html code snippet of mark effect set:
Step a) judges whether the html code contains the webpage label, if do not contain the webpage label, adds directly that then markup tags generates the html code snippet with the mark effect; If contain the webpage label, then enter step b);
Step b) judges whether the head and the tail part of html code exists incomplete webpage label; If exist, again begin to carry out from step a) behind the webpage label that then rejecting head and the tail part is incomplete; If there is no, then enter step c);
Step c), outermost webpage label forms the html code division set of html code snippet as boundary in the html code;
Step d), a) to step c), until all webpage labels are disallowable in the html code, and generation is with the html code snippet set of mark effect to each section html code snippet repeating step in the set of html code snippet.
The present invention adopts above technical scheme, compared with prior art has following technique effect:
The present invention can be so that the user carries out accurate mark to the content of text that comprises complicated webpage tag combination on the webpage, and makes the 26S Proteasome Structure and Function of the webpage behind the mark keep stable, and the html code snippet of avoiding containing the mark effect lost efficacy.
Description of drawings
Fig. 1 is the Mk system process flow diagram;
Fig. 2 is the process flow diagram that generates with the html code snippet set of mark effect;
Fig. 3 is the schematic diagram of embodiment 1;
Fig. 4 is the schematic diagram of embodiment 2;
Fig. 5 is the schematic diagram of embodiment 3;
Fig. 6 is the schematic diagram of embodiment 4.
Embodiment
Below in conjunction with accompanying drawing technical scheme of the present invention is described in further detail:
As shown in Figure 1, the present invention propose a kind of on webpage the Mk system of retrtieval, concrete minute following several steps:
The first step, the user selects content of text to carry out mark at webpage;
Second step, system resolves the dom tree structure that obtains webpage and obtains html code corresponding to user-selected content of text webpage;
In the 3rd step, in the html of correspondence code, extract the maximum string assemble that does not contain the webpage label;
In the 4th step, each character string adds markup tags with the html code snippet set of generation with the mark effect in the pair set, and substitutes respectively the content of text in the former webpage, forms the new web document with the mark effect;
In the 5th step, in browser, show the web document with the mark effect.
As shown in Figure 2, wherein generate with the concrete steps of the html code snippet of mark effect set as follows:
Step a) judges whether the html code contains the webpage label, if do not contain the webpage label, adds directly that then markup tags generates the html code snippet with the mark effect; If contain the webpage label, then enter step b);
Step b) judges whether the head and the tail part of html code exists incomplete webpage label; If exist, again begin to carry out from step a) behind the webpage label that then rejecting head and the tail part is incomplete; If there is no, then enter step c);
Step c), outermost webpage label becomes the html code division set of html code snippet as boundary in the html code;
Step d), a) to step c), until all webpage labels are disallowable in the html code, and generation is with the html code snippet set of mark effect to each section html code snippet repeating step in the set of html code snippet.
In following examples with "<span〉</span " label that serves as a mark, in the practical application, can use other labels label that serves as a mark.
Embodiment 1: as shown in Figure 3, user selection has selected " write, thing is " to carry out mark at webpage.
The html code that its chosen content of systematic analysis is corresponding is " writing; thing is ", do not contain the webpage label, directly add markup tags "<span〉</span " generate html code snippet with the mark effect "<span〉write; thing is</span ", and corresponding content of text " write, thing is " in the alternative webpage, pass through browser display after forming the new web document with the mark effect.
Embodiment 2: as shown in Figure 4, user selection has selected " The things are still there but men are no more the same ones " to carry out mark at webpage.
The html code that its chosen content of systematic analysis is corresponding is " The things are still there but men are no more the same ones</p〉", contain webpage label "</p〉" at the code afterbody, " The things are still there but men are no more the same ones " remaining after rejecting no longer contains the webpage label, then add markup tags "<span〉</span " generate html code snippet with the mark effect "<span〉The things are still there but men are no more the same ones</span ", and corresponding content of text " The things are still there but men are no more the same ones " in the alternative webpage, pass through browser display after forming the new web document with the mark effect.
Embodiment 3: as shown in Figure 5, user selection has selected " writing the hello thing is " to carry out mark at webpage.
The html code that its chosen content of systematic analysis is corresponding be " hold<p〉pen<p〉<i hello</i</p</p thing is ", initial and tail sections does not contain incomplete webpage label, comprises the webpage label in the code.System take the outermost layer label "<p〉</p " as boundary with code resolve into html code snippet set hold pen<p〉<i hello</i</p, thing is }.System judges that " holding " " thing is " do not comprise the webpage label, and it is added that respectively markup tags generates the html code snippet with the mark effect: "<span〉hold</span〉", "<span〉thing is</span〉".
The continuation analysis html of system code snippet " pen<p〉<i〉hello</i〉</p〉", take its outermost layer label "<p〉</p " as boundary with code resolve into the html code snippet set the pen,<i〉hello</i 〉, system judges that " pen " do not comprise the webpage label, to its add markup tags generate html code snippet with the mark effect "<span〉</span ".
The continuation analysis html of system code snippet "<i〉hello</i〉", take its outermost layer label "<i〉</i " as boundary with code resolve into the html code snippet set hello}, and to its add markup tags generate html code snippet with the mark effect "<span〉hello</span ".
Final system obtain with the set of the html code snippet of mark effect<span〉hold</span,<span〉pen</span 〉,<span〉hello</span 〉,<span〉thing is</span 〉, and substitute respectively in the webpage corresponding content of text and " hold " ", pen ", " hello ", " thing is ", pass through browser display after forming the new web document with the mark effect.
Embodiment 4: as shown in Figure 6, user selection has selected " writing the hello thing is " to carry out mark at webpage.
The html code that its chosen content of systematic analysis is corresponding be " hold<p〉pen<p〉<i hello</i</p</p<div thing</div be ", initial and tail sections does not contain incomplete webpage label, comprise the webpage label in the code, the outermost layer label be and column label "<p〉</p ", "<div〉</div ".System take the outermost layer label "<p〉</p ", "<div〉</div " as boundary with code resolve into html code snippet set hold pen<p〉<i hello</i</p, thing, be }.System judges that " holding ", " thing ", "Yes" do not comprise the webpage label, and it is added that respectively markup tags generates the html code snippet with the mark effect: "<span〉hold</span〉", "<span〉thing</span〉", "<span〉be</span〉".
The continuation analysis html of system code snippet " pen<p〉<i〉hello</i〉</p〉", take its outermost layer label "<p〉</p " as boundary with code resolve into the html code snippet set the pen,<i〉hello</i 〉, system judges that " pen " do not comprise the webpage label, to its add markup tags generate html code snippet with the mark effect "<span〉</span ".
The continuation analysis html of system code snippet "<i〉hello</i〉", take its outermost layer label "<i〉</i " as boundary with code resolve into html code snippet set hello}, and to its add markup tags generate html code snippet with the mark effect "<span〉hello</span ".
Final system obtain with the set of the html code snippet of mark effect<span〉hold</span,<span〉pen</span 〉,<span〉hello</span 〉,<span〉thing</span 〉,<span〉be</span 〉, and substitute respectively in the webpage corresponding content of text and " hold " ", pen ", " hello ", " thing ", "Yes" is passed through browser display after forming the new web document with the mark effect.

Claims (2)

1. the Mk system of a retrtieval on webpage is characterized in that, system carries out following action behind the user selection content of text mark:
Step 1) is resolved the dom tree structure that obtains webpage to webpage, obtains the content of text of user selection;
Step 2), the html code that the content of text of user selection is corresponding is analyzed, extract the string assemble of the maximum that does not contain the webpage label in the mode of recurrence, and each character string that extracts is added that markup tags is to generate the html code snippet set with the mark effect;
Step 3) substitutes respectively corresponding content of text in the webpage with the html code snippet that contains the mark effect that generates, and forms the new web document with the mark effect, and passes through browser display.
According to claim 1 a kind of on webpage the Mk system of retrtieval, it is characterized in that step 2) described generation is as follows with the concrete steps of the html code snippet set of mark effect:
Step a) judges whether the html code contains the webpage label, if do not contain the webpage label, adds directly that then markup tags generates the html code snippet with the mark effect; If contain the webpage label, then enter step b);
Step b) judges whether the head and the tail part of html code exists incomplete webpage label; If exist, again begin to carry out from step a) behind the webpage label that then rejecting head and the tail part is incomplete; If there is no, then enter step c);
Step c), outermost webpage label forms the html code division set of html code snippet as boundary in the html code;
Step d), a) to step c), until all webpage labels are disallowable in the html code, and generation is with the html code snippet set of mark effect to each section html code snippet repeating step in the set of html code snippet.
CN2013101868407A 2013-05-20 2013-05-20 Marking system for marking texts on web pages Pending CN103324668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101868407A CN103324668A (en) 2013-05-20 2013-05-20 Marking system for marking texts on web pages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101868407A CN103324668A (en) 2013-05-20 2013-05-20 Marking system for marking texts on web pages

Publications (1)

Publication Number Publication Date
CN103324668A true CN103324668A (en) 2013-09-25

Family

ID=49193411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101868407A Pending CN103324668A (en) 2013-05-20 2013-05-20 Marking system for marking texts on web pages

Country Status (1)

Country Link
CN (1) CN103324668A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530430A (en) * 2013-11-06 2014-01-22 焦点科技股份有限公司 Method and system for cross-label processing of html rich text data with format
CN106372232A (en) * 2016-09-09 2017-02-01 北京百度网讯科技有限公司 Method and device for mining information based on artificial intelligence
CN110457616A (en) * 2019-07-24 2019-11-15 万达信息股份有限公司 A kind of method that webpage consistency is shown under isomery CPU system
CN111680247A (en) * 2020-04-28 2020-09-18 平安国际智慧城市科技股份有限公司 Local calling method, device, equipment and storage medium of webpage character string

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010042082A1 (en) * 2000-04-13 2001-11-15 Toshiaki Ueguri Information processing apparatus and method
CN101183354A (en) * 2007-12-12 2008-05-21 腾讯科技(深圳)有限公司 Rainbow text realizing method and apparatus
CN101490676A (en) * 2006-05-10 2009-07-22 谷歌公司 Web notebook tools
CN101739415A (en) * 2008-11-25 2010-06-16 华中师范大学 Browser-oriented webpage labeling system
CN102637193A (en) * 2012-02-23 2012-08-15 北京航空航天大学 Webpage instant collaborative browsing method based on DOM (document object model) and XMPP (extensible messaging and presence protocol)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010042082A1 (en) * 2000-04-13 2001-11-15 Toshiaki Ueguri Information processing apparatus and method
CN101490676A (en) * 2006-05-10 2009-07-22 谷歌公司 Web notebook tools
CN101183354A (en) * 2007-12-12 2008-05-21 腾讯科技(深圳)有限公司 Rainbow text realizing method and apparatus
CN101739415A (en) * 2008-11-25 2010-06-16 华中师范大学 Browser-oriented webpage labeling system
CN102637193A (en) * 2012-02-23 2012-08-15 北京航空航天大学 Webpage instant collaborative browsing method based on DOM (document object model) and XMPP (extensible messaging and presence protocol)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530430A (en) * 2013-11-06 2014-01-22 焦点科技股份有限公司 Method and system for cross-label processing of html rich text data with format
CN103530430B (en) * 2013-11-06 2016-05-25 焦点科技股份有限公司 A kind of html rich text data containing form across label processing method and system
CN106372232A (en) * 2016-09-09 2017-02-01 北京百度网讯科技有限公司 Method and device for mining information based on artificial intelligence
CN106372232B (en) * 2016-09-09 2020-01-10 北京百度网讯科技有限公司 Information mining method and device based on artificial intelligence
CN110457616A (en) * 2019-07-24 2019-11-15 万达信息股份有限公司 A kind of method that webpage consistency is shown under isomery CPU system
CN110457616B (en) * 2019-07-24 2024-02-13 万达信息股份有限公司 Method for displaying web page consistency under heterogeneous CPU system
CN111680247A (en) * 2020-04-28 2020-09-18 平安国际智慧城市科技股份有限公司 Local calling method, device, equipment and storage medium of webpage character string
CN111680247B (en) * 2020-04-28 2024-04-05 深圳赛安特技术服务有限公司 Local calling method, device and equipment of webpage character strings and storage medium

Similar Documents

Publication Publication Date Title
CN102184189B (en) Webpage core block determining method based on DOM (Document Object Model) node text density
CN102253979B (en) Vision-based web page extracting method
CN104461484B (en) The implementation method and device of front-end template
CN101984434B (en) Webpage data extracting method based on extensible language query
CN105022803B (en) A kind of method and system for extracting Web page text content
CN104679903B (en) The operating method and device of a kind of tables of data
CN102479181B (en) Based on Web page text extracting method and the device of DIV position
CN104217036B (en) A kind of webpage content extracting method and equipment
CN102591612B (en) General webpage text extraction method based on punctuation continuity and system thereof
CN102270206A (en) Method and device for capturing valid web page contents
CN104598577A (en) Extraction method for webpage text
CN103294781A (en) Method and equipment used for processing page data
CN103699591A (en) Page body extraction method based on sample page
CN103714176A (en) Webpage text extraction method based on maximum text density
CN103853760A (en) Method and device for extracting contents of bodies of web pages
CN101872350A (en) Web page text extracting method and device thereof
CN102314494B (en) Method and equipment for processing webpage contents
CN101609399A (en) Intelligent website development system and method based on modeling
CN103324668A (en) Marking system for marking texts on web pages
CN103810251A (en) Method and device for extracting text
CN103838823A (en) Website content accessible detection method based on web page templates
CN103246732A (en) Online Web news content extracting method and system
CN103049536A (en) Webpage main text content extracting method and webpage text content extracting system
CN105320734A (en) Web page core content extraction method
CN105740355B (en) Webpage context extraction method and device based on aggregation text density

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130925