CN103745003A - HTML fragment detection method - Google Patents

HTML fragment detection method Download PDF

Info

Publication number
CN103745003A
CN103745003A CN201410035578.0A CN201410035578A CN103745003A CN 103745003 A CN103745003 A CN 103745003A CN 201410035578 A CN201410035578 A CN 201410035578A CN 103745003 A CN103745003 A CN 103745003A
Authority
CN
China
Prior art keywords
label
value
tag
execution step
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410035578.0A
Other languages
Chinese (zh)
Other versions
CN103745003B (en
Inventor
王海昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201410035578.0A priority Critical patent/CN103745003B/en
Publication of CN103745003A publication Critical patent/CN103745003A/en
Application granted granted Critical
Publication of CN103745003B publication Critical patent/CN103745003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses an HTML fragment detection method. The method includes: recording a latest single tag before a current detection position by special single tag parameters S; recording currently-traversed unmatched all start tags by a set Z; matching the tags based on the parameters; when an end tag is detected, preferably matching the end tag with the single tag detected the last time; if matching fails, matching double tags; if errors are detected, notifying specific errors and error locations. The HTML fragment detection method is applicable to accurately detecting single and double tags in HTML fragments and also adaptable to formats of double tags composed of single tags and end tags; accordingly, through the application of the method, detection accuracy is effectively improved and errors can be accurately located.

Description

The detection method of HTML fragment
Technical field
The present invention relates to computer internet technology, particularly relate to a kind of detection method of HTML(Hypertext Markup Language) fragment.
Background technology
In the application scenarios of What You See Is What You Get, a modal assembly is exactly RichText Edition device.RichText Edition device provides two kinds of use-patterns, and wherein a kind of use-pattern is source code mode, needs user manually to input HTML fragment, and then play up under which.In this process, browser need to carry out Intelligent Measurement to the HTML fragment of user's input, to guarantee that it meets HTML code requirement.
Whether the detection method of existing HTML fragment is mainly to detect HTML code fragment label to occur in pairs.First collect labels all in HTML code fragment, comprise and start label and end-tag, thereby then mate to detect according to certain mode, whether mate, such as calculating the beginning label of same label and the number of end-tag etc.
There are the following problems for said method:
1, function singleness, accuracy are low: the method is only carried out the detection of two labels, be that label must occur in pairs, have to start label and must have corresponding end-tag, and in actual HTML code requirement, be the existence that allows single label, a single label can independently exist.Therefore, this method that detects two labels can cause single label to be detected as wrong situation occurring, thereby correct identification form label reduces the accuracy of detection, and then can cause browser can not accurately show the rendering effect of user's needs.
2, to a little less than wrong station-keeping ability: after detecting, only can not mate in pairs by call tag, can not provide detailed error reason and error label, therefore can do nothing to help user and navigate to fast erroneous point to carry out mistake investigation, to deal with problems.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of detection method of HTML fragment, and the method can improve the accuracy that HTML fragment detects, and can accurately locate mistake.
In order to achieve the above object, the technical scheme that the present invention proposes is:
A detection method for HTML fragment, comprising:
The first label of a, HTML fragment that needs are detected is as current label to be detected; The initial value of single tag parameter S is set to sky; For recording the current not set Z of the beginning label of coupling, be set to sky;
B, judge that whether current label M to be detected is empty, if so, performs step c; Otherwise, execution step d;
C, judge that whether described set Z is empty, if so, finishes described detection method; Otherwise, judge that all beginning labels in described set Z do not have corresponding end-tag and reporting system, finish described detection method;
D, judge whether described M is single label, and if so, the value of current single tag parameter S is set to the value of described M, execution step g; Otherwise, judge whether described M is to start label, if so, described M is inserted in described set Z, and the value of described S is set to sky, execution step g, otherwise, execution step e;
E, judge whether the value of described S equals the value of described M, if so, the value of described S is set to sky, execution step g; Otherwise the value of described S is set to sky, execution step f;
F, judge that whether described set Z is empty, if so, judges that described M place exists unnecessary end-tag reporting system, finishes described detection method; Otherwise, take out the label N that finally enters described set Z, described label N is deleted from described set, judge whether described N equates with the value of described M, if so, perform step g, otherwise, judge not end-tag the reporting system corresponding with described N of described M place, finish described detection method;
G, the next label after M described in described HTML fragment, as current label to be detected, are carried out described step b.
A detection method for HTML fragment, comprising:
The first label of a, HTML fragment that needs are detected is as current label to be detected; The initial value of single tag parameter S is set to sky; For recording the current not set Z of the beginning label of coupling, be set to sky;
B, judge that whether current label M to be detected is empty, if so, performs step c; Otherwise, execution step d;
C, judge that whether described set Z is empty, if so, finishes described detection method; Otherwise, according to the backward that enters the order of set, be followed successively by each label in described set Z, construct corresponding end-tag and be placed in the afterbody of described HTML fragment, finish described detection method;
D, judge whether described M is single label, and if so, the value of current single tag parameter S is set to the value of described M, execution step g; Otherwise, judge whether described M is to start label, if so, described M is inserted in described set Z, and the value of described S is set to sky, execution step g, otherwise, execution step e;
E, judge whether the value of described S equals the value of described M, if so, the value of described S is set to sky, execution step g; Otherwise, execution step f;
F, judge that whether described set Z is empty, if so, described M is deleted from described HTML fragment to execution step g; Otherwise the value of described S is set to sky, take out the label N that finally enters described set Z, described label N is deleted from described set, when described N equates with the value of described M, execution step g; When the value of described N and described M is unequal, construct the end-tag that described N is corresponding, and constructed label be inserted into the position at described M place, as described M in front adjacent label, using described M as current label to be detected, execution step b;
G, the next label after M described in described HTML fragment, as current label to be detected, perform step b.
In sum, the detection method of the HTML fragment that the present invention proposes, considered the identification to single label in HTML fragment, utilize special single tag parameter S to record a current detection position nearest single label before, utilize set Z record current traversed but all beginning labels of coupling not, based on these parameters, carry out the coupling of label, and, when end-tag being detected, preferentially it is mated with the last single label detecting, in the situation that cannot mating, carry out again the coupling of two labels, when mistake being detected, carry out the notice of concrete mistake and errors present.So, both can accurately identify the single, double label in HTML fragment, the form of two labels that also compatibility is comprised of single label and end-tag simultaneously, therefore, can effectively improve the accuracy of detection, and can accurately locate mistake.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the embodiment of the present invention one;
Fig. 2 is the schematic flow sheet of the embodiment of the present invention two.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with the accompanying drawings and the specific embodiments.
Core concept of the present invention is: the single label in HTML fragment is identified, and preferentially it is mated with the last single label detecting when end-tag being detected, if cannot mate, carry out again the coupling of two labels, in addition, utilize set Z to preserve current not all beginning labels of coupling, so, both can accurately identify the single, double label in HTML fragment, the form of two labels that also compatibility is comprised of single label and end-tag simultaneously, therefore, can effectively improve the accuracy of detection.
Fig. 1 is the HTML fragment detection method schematic flow sheet of the embodiment of the present invention one, and as shown in Figure 1, this embodiment mainly comprises:
The first label of step 101, HTML fragment that needs are detected is as current label to be detected; The initial value of single tag parameter S is set to sky; For recording the current not set Z of the beginning label of coupling, be set to sky.
This step, for realizing the initialization procedure of detection.Here, described single tag parameter S is for recording single label that the last time detects, it is a current detection position nearest single label before, so that when end-tag being detected, can pay the utmost attention to its whether with nearest single tag match, to realize the form of the two labels to being formed by single label and end-tag, be the compatibility of <label></labelGr eatT.GreaT.GT form.
Set Z, for recording the current not beginning label of coupling, so that when running into end-tag, carries out the coupling of end-tag accordingly, to realize the detection of two labels.
Step 102, judge that whether current label M to be detected is empty, if so, performs step 103; Otherwise, execution step 105.
Here, if current label M to be detected is empty, explanation has detected HTML sheet segment trailer, the all labels in this fragment have been traveled through, now, need to proceed to step 103, to judge, currently whether have the beginning label that there is no coupling, and then judge whether accordingly exist to lose the situation of end-tag.
Step 103~104, judge that whether described set Z is empty, if so, finishes described detection method; Otherwise, judge that all beginning labels in described set Z do not have corresponding end-tag and reporting system, finish described detection method.
In this step, if set Z is not empty, illustrate in the current beginning label having traveled through and to have the label that does not match end-tag, and now traveled through HTML fragment, therefore, can determine the current mistake that occurred, judge that all beginning labels in described set Z do not have corresponding end-tag, now need this error notification to system, be specially all beginning labels that return in described set Z, and indicate these labels and there is no corresponding end-tag.
If set Z be sky, illustrate and in HTML fragment, do not have Problem-Error.
Step 105~108, judge whether described M is single label, and if so, the value of current single tag parameter S is set to the value of described M, execution step 117; Otherwise, judge whether described M is to start label, if so, described M is inserted in described set Z, and the value of described S is set to sky, execution step 117, otherwise, execution step 109.
Here, judging M when starting label, illustrating that next label after single label of current S record is for starting label rather than end-tag, in such cases, do not need again the indicated single label of S have been mated with end-tag below, therefore, need to its value be set to sky.
In this step, the same existing system of concrete determination methods of single label, is about to it and mates with single label of system defined, if matched, explanation is single label.
According to existing standard, whether the differentiation that starts label and end-tag is by there being "/" character to judge before tag name, as <div> and </span> label, wherein first label is for starting label, label is called div, second is end-tag, and label is called span.
In this step, the in the situation that of M non-NULL, need to first judge that this label is single label, if, utilize parameter S to record this label, if not, determine whether this label is to start label, if start label, this label is inserted in set Z, so that while end-tag being detected in follow-up testing process, based on this, carry out the coupling of two labels; If not starting label, explanation is end-tag, now needs to enter step 109 and carries out corresponding tag match process.
Step 109~111, judge whether the value of described S equals the value of described M, if so, the value of described S is set to sky, execution step 117; Otherwise the value of described S is set to sky, execution step 112.
In this step, when the value of described S equals the value of described M, illustrate that current end-tag M and the last single label detecting match, there is not mistake in M place, thereby complete the coupling of the label M that current detection arrives, therefore value that can described S be set to empty after execution step 117, continue the matching process of next label.
When the value of described S is not equal to the value of described M, need the value of described S to be set to empty rear execution step 112~116, minute situation judges whether to exist the mistake that end-tag is unnecessary or lose.
Step 112~116, judge that whether described set Z is empty, if so, judges that described M place exists unnecessary end-tag reporting system, finishes described detection method; Otherwise, take out the label N that finally enters described set Z, described label N is deleted from described set, judge whether described N equates with the value of described M, if so, perform step 117, otherwise, judge not end-tag the reporting system corresponding with described N of described M place, finish described detection method.
In this step, when set, Z be when sky, illustrates and before described M, has not had the beginning label that matches with it, therefore, can judge that label M is as unnecessary end-tag, and when set Z is not sky, take out the label N that finally adds this set, judge that whether N is identical with the value of M, whether i.e. judgement mates, if identical, explanation coupling, now performs step 117 and carries out the detection of next label; If different, illustrate that N does not have corresponding end-tag, therefore need this problem to notify to system, that is, return to the error message of not corresponding with the described N end-tag in M place, so that user revises accordingly accordingly.
Step 117, the next label after M described in described HTML fragment, as current label to be detected, perform step 102.
This step, upgrades for treating tags detected, is about to described M next label afterwards as new label to be detected, with the label traveling through in described HTML fragment, detects accordingly.
Preferably, when specific implementation above-described embodiment, described set Z can adopt the mode of stack to realize, and is certainly also not limited to other forms of data structure, does not repeat them here.
From above-described embodiment, can find out that the present invention is by distinguishing single label and two labels, can detect exactly the loss that occurs in HTML fragment or the mistake of unnecessary end-tag.When a place mistake being detected, just carry out the prompting of this mistake, return to corresponding location of mistake information and type of error, for user, based on this, revise accordingly.In actual applications, can also when mistake being detected, by detection scheme, automatically carry out corresponding error correcting, make detection scheme there is intelligent fault-tolerance function.Specifically can adopt the scheme of following embodiment bis-to realize this purpose.
Fig. 2 is the HTML fragment detection method schematic flow sheet of the embodiment of the present invention two, and as shown in Figure 2, this embodiment mainly comprises:
The first label of step 201, HTML fragment that needs are detected is as current label to be detected; The initial value of single tag parameter S is set to sky; For recording the current not set Z of the beginning label of coupling, be set to sky.
This step, with step 101, does not repeat them here.
Step 202, judge that whether current label M to be detected is empty, if so, performs step 203; Otherwise, execution step 205.
203~204, judge that whether described set Z is empty, if so, finishes described detection method; Otherwise, according to the backward that enters the order of set, be followed successively by each label in described set Z, construct corresponding end-tag and be placed in the afterbody of described HTML fragment, finish described detection method.
This step and previous embodiment difference be, starts label while there is no the situation of end-tag detecting to exist, and constructs respectively the afterbody that is placed in successively HTML fragment after corresponding end-tag, to realize the function of corresponding intelligent correction for these start label.
Step 205~208, judge whether described M is single label, and if so, the value of current single tag parameter S is set to the value of described M, execution step 216; Otherwise, judge whether described M is to start label, if so, described M is inserted in described set Z, and the value of described S is set to sky, execution step 216, otherwise, execution step 209.
Here, when judging M for beginning label, illustrate that the next label after the single label in S is to start label, in such cases, do not need again the indicated single label of S have been mated with end-tag below, therefore, need to its value be set to sky.
Step 209~210, judge whether the value of described S equals the value of described M, if so, the value of described S is set to sky, execution step 216; Otherwise, execution step 211.
Step 211~215, judge that whether described set Z is empty, if so, deletes described M execution step 216 from described HTML fragment; Otherwise, the value of described S is set to sky, takes out the label N that finally enters described set Z, and described label N is deleted from described set, when described N equates with the value of described M, execution step 216, when the value of described N and described M is unequal, constructs the end-tag that described N is corresponding, and constructed label is inserted into the position at described M place, as described M in front adjacent label, using described M as current label to be detected, execution step 202.
This step with previous embodiment difference is, deleted judging while there is unnecessary end-tag, judging while exist losing end-tag, construct corresponding end-tag and be inserted in current detection position before, to realize the function of corresponding intelligent correction.
Step 216, the next label after M described in described HTML fragment, as current label to be detected, perform step 202.
Preferably, with above-described embodiment one, described set Z can adopt the mode of stack to realize.
In sum, these are only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (4)

1. a detection method for HTML fragment, is characterized in that, comprising:
The first label of a, HTML fragment that needs are detected is as current label to be detected; The initial value of single tag parameter S is set to sky; For recording the current not set Z of the beginning label of coupling, be set to sky;
B, judge that whether current label M to be detected is empty, if so, performs step c; Otherwise, execution step d;
C, judge that whether described set Z is empty, if so, finishes described detection method; Otherwise, judge that all beginning labels in described set Z do not have corresponding end-tag and reporting system, finish described detection method;
D, judge whether described M is single label, and if so, the value of current single tag parameter S is set to the value of described M, execution step g; Otherwise, judge whether described M is to start label, if so, described M is inserted in described set Z, and the value of described S is set to sky, execution step g, otherwise, execution step e;
E, judge whether the value of described S equals the value of described M, if so, the value of described S is set to sky, execution step g; Otherwise the value of described S is set to sky, execution step f;
F, judge that whether described set Z is empty, if so, judges that described M place exists unnecessary end-tag reporting system, finishes described detection method; Otherwise, take out the label N that finally enters described set Z, described label N is deleted from described set, judge whether described N equates with the value of described M, if so, perform step g, otherwise, judge not end-tag the reporting system corresponding with described N of described M place, finish described detection method;
G, the next label after M described in described HTML fragment, as current label to be detected, are carried out described step b.
2. method according to claim 1, is characterized in that, described set Z adopts the mode of stack to realize.
3. a detection method for HTML fragment, is characterized in that, comprising:
The first label of a, HTML fragment that needs are detected is as current label to be detected; The initial value of single tag parameter S is set to sky; For recording the current not set Z of the beginning label of coupling, be set to sky;
B, judge that whether current label M to be detected is empty, if so, performs step c; Otherwise, execution step d;
C, judge that whether described set Z is empty, if so, finishes described detection method; Otherwise, according to the backward that enters the order of set, be followed successively by each label in described set Z, construct corresponding end-tag and be placed in the afterbody of described HTML fragment, finish described detection method;
D, judge whether described M is single label, and if so, the value of current single tag parameter S is set to the value of described M, execution step g; Otherwise, judge whether described M is to start label, if so, described M is inserted in described set Z, and the value of described S is set to sky, execution step g, otherwise, execution step e;
E, judge whether the value of described S equals the value of described M, if so, the value of described S is set to sky, execution step g; Otherwise, execution step f;
F, judge that whether described set Z is empty, if so, described M is deleted from described HTML fragment to execution step g; Otherwise the value of described S is set to sky, take out the label N that finally enters described set Z, described label N is deleted from described set, when described N equates with the value of described M, execution step g; When the value of described N and described M is unequal, construct the end-tag that described N is corresponding, and constructed label be inserted into the position at described M place, as described M in front adjacent label, using described M as current label to be detected, execution step b;
G, the next label after M described in described HTML fragment, as current label to be detected, perform step b.
4. method according to claim 3, is characterized in that, described set Z adopts the mode of stack to realize.
CN201410035578.0A 2014-01-24 2014-01-24 HTML fragment detection method Active CN103745003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410035578.0A CN103745003B (en) 2014-01-24 2014-01-24 HTML fragment detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410035578.0A CN103745003B (en) 2014-01-24 2014-01-24 HTML fragment detection method

Publications (2)

Publication Number Publication Date
CN103745003A true CN103745003A (en) 2014-04-23
CN103745003B CN103745003B (en) 2017-01-25

Family

ID=50502021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410035578.0A Active CN103745003B (en) 2014-01-24 2014-01-24 HTML fragment detection method

Country Status (1)

Country Link
CN (1) CN103745003B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408208A (en) * 2014-12-22 2015-03-11 上海斐讯数据通信技术有限公司 HTML5 layout detection method and system
CN110795931A (en) * 2018-07-17 2020-02-14 福建天泉教育科技有限公司 Method and terminal for detecting WEB website page language
CN111859850A (en) * 2020-07-29 2020-10-30 厦门亿联网络技术股份有限公司 Method and device for integrating rich text fragments, electronic equipment and storage medium
CN111967274A (en) * 2020-08-25 2020-11-20 文思海辉智科科技有限公司 Label conversion processing method and device, electronic equipment and readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408208A (en) * 2014-12-22 2015-03-11 上海斐讯数据通信技术有限公司 HTML5 layout detection method and system
CN110795931A (en) * 2018-07-17 2020-02-14 福建天泉教育科技有限公司 Method and terminal for detecting WEB website page language
CN110795931B (en) * 2018-07-17 2022-10-21 福建天泉教育科技有限公司 Method and terminal for detecting WEB website page language
CN111859850A (en) * 2020-07-29 2020-10-30 厦门亿联网络技术股份有限公司 Method and device for integrating rich text fragments, electronic equipment and storage medium
CN111859850B (en) * 2020-07-29 2024-05-10 厦门亿联网络技术股份有限公司 Method, device, electronic equipment and storage medium for integrating rich text fragments
CN111967274A (en) * 2020-08-25 2020-11-20 文思海辉智科科技有限公司 Label conversion processing method and device, electronic equipment and readable storage medium
CN111967274B (en) * 2020-08-25 2024-05-31 文思海辉智科科技有限公司 Label conversion processing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN103745003B (en) 2017-01-25

Similar Documents

Publication Publication Date Title
KR101114194B1 (en) Assisted form filling
US10613971B1 (en) Autonomous testing of web-based applications
US10650192B2 (en) Method and device for recognizing domain named entity
US11003442B2 (en) Application programming interface documentation annotation
US11119988B2 (en) Performing logical validation on loaded data in a database
CN103745003A (en) HTML fragment detection method
JP6614756B2 (en) Transaction system error detection method, apparatus, storage medium, and computer device
CN105468662B (en) Energy consumption data processing method and system based on table code values
CN102737012A (en) Text information comparison method and system
CN114710224A (en) Frame synchronization method and device, computer readable medium and electronic device
US20150019959A1 (en) Method and apparatus for bidirectional typesetting
CN104836896A (en) Method and device for carrying out error correction prompt to telephone number
CN110262870B (en) Method, device, computer equipment and storage medium for locating exception by dump file
US11748368B1 (en) Data field transaction repair interface
US8290834B2 (en) Ad-hoc updates to source transactions
US10078657B2 (en) Detection of data replication consistency
US11080808B2 (en) Automatically attaching optical character recognition data to images
US20130326349A1 (en) Method and System to Perform Multiple Scope Based Search and Replace
CN109240703A (en) A kind of system mistake reminding method and device
CN103902514A (en) Word format extracting and reutilizing method
CN109918385A (en) Tripartite&#39;s account checking method, electronic device and readable storage medium storing program for executing
CN101794282B (en) Method and system for detection of knowledge tagging result
CN110704686B (en) Quality detection method and device for semi-structured data, storage medium and equipment
CN115660540B (en) Cargo tracking method, cargo tracking device, computer equipment and storage medium
US20130061133A1 (en) Markup language schema error correction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant