CN103095644A - Data content analytic method and data content analytic device - Google Patents

Data content analytic method and data content analytic device Download PDF

Info

Publication number
CN103095644A
CN103095644A CN201110334808XA CN201110334808A CN103095644A CN 103095644 A CN103095644 A CN 103095644A CN 201110334808X A CN201110334808X A CN 201110334808XA CN 201110334808 A CN201110334808 A CN 201110334808A CN 103095644 A CN103095644 A CN 103095644A
Authority
CN
China
Prior art keywords
character
matching value
ascii
field
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201110334808XA
Other languages
Chinese (zh)
Other versions
CN103095644B (en
Inventor
吴博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201110334808.XA priority Critical patent/CN103095644B/en
Publication of CN103095644A publication Critical patent/CN103095644A/en
Application granted granted Critical
Publication of CN103095644B publication Critical patent/CN103095644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data content analytic method and a data content analytic device, and the method and the device are used for lowering analytic time complexity and reducing analysis time when data content issued from a server is analyzed. The data content analytic method comprises that when data content in a data packet body is analyzed, characters contained in the data content are traversed in sequence, and American standard code for information interchange (ASCII) value and an ASCII character corresponding to each character are confirmed; according to the confirmed ASCII value, the ASCII character and a preset matching array, matching value corresponding to each character is confirmed; according to the matching value corresponding to each character and the ASCII character, the initial position of a head field is confirmed, and the head field is analyzed; and according to the analyzed head field, the initial position of binary system content and the size of the binary system content are confirmed, and the binary system content is analyzed.

Description

A kind of data content analytic method and device
Technical field
The present invention relates to mobile terminal data analytic technique field, relate in particular to a kind of data content analytic method and device.
Background technology
Multipurpose internet mail expansion (MIME, Multipurpose Internet Mail Extensions) be an internet standard, it has expanded standard email, can support the email message of the multiple formats such as non-ascii character, binary format annex.It is very extensive that the MIME agreement is used in mobile Internet, and a lot of application all adopt this agreement to transmit the static resources such as picture, audio frequency, text.The fields such as this protocol information head content type (Content-Type), content delivery coding (Content-Transfer-Encoding) and content designator (Content-ID).When a plurality of data content of transmission, Content-Type can be defined as Content-Type:multipart/mixed usually; Boundary=End (End is the self-defining character string of server, as separator).One is typical as follows based on the data inclusion of MIME agreement:
HTTP/1.1 200 OK
X-Powered-By:Servlet/2.5
Server:Sun Java System Application Server 9.1_02
X-DP-next URI:/content/refresh/
Content-Type:multipart/mixed;boundary=End
Content-Length:7479
Date:Tue,26 May 2009 01:57:34 GMT
Connection:Keep-Alive
--End
Content-Type:image/jpeg
Content-Transfer-Encoding:binary
Content-Id:0526090000018182
Content-Length:1764
* * * * * * * * (binary content) * * * * * * * * * * * * *
--End
Content-Type:image/jpeg
Content-Transfer-Encoding:binary
Content-Id:0526090023018276
Content-Length:1521
* * * * * * (binary content) * * * * * * * * * * * * * * * *
--End--
In above-mentioned example, transmitted simultaneously two pictures contents, its ID is respectively 0526090000018182 and 0526090023018276.Receiving terminal in order to parse each pictures, need to navigate to starting position and the end position of the binary content of every pictures exactly after receiving above-mentioned data inclusion.Picture take ID as 0526090000018182 is as example, and receiving terminal need to find first "--"+" boundary " (being End in this example) character string when resolving the first pictures, and then finds the header field of this picture.After the position of having determined " Content-Length:1764/r/n ", just can navigate to the starting position of the binary content of this picture, and obtain the size of this picture, after the starting position and length of the binary content that has obtained this picture, can read corresponding binary content from the data inclusion, and then parse the content of this picture.In like manner, when resolving the second pictures, need to find the next one "--"+" boundary " (being End in this example) character string, repeat above-mentioned steps, to connect the content of next pictures.
Seen from the above description, correctly parse the key factor of picture when the starting position of accurate location binary content and length, prior art is when the beginning of locating binary content and length, what adopt is the keyword search method, the method need to find the upper character string that the binary content of each picture begins to locate (in upper example not " Content-Length:1764/r/n "), thus the starting position of orienting binary content.For the data inclusion that comprises a plurality of pictures, need repeatedly repeatedly to search for a plurality of keywords and locate the starting position of binary content, the end of judgement binary content or resolve value corresponding to header field.For example, when resolving the first pictures, need to search successively the position of the header field such as Content-Type, Content-Transfer-Encoding and Content-ID in the data inclusion, this means that receiving terminal needs repeatedly ergodic data inclusion, pictorial information successfully could be resolved, thereby increase the parsing time.
Summary of the invention
The embodiment of the present invention provides a kind of data content analytic method and device, in order to when the data content in the data inclusion is resolved, reduces the parsing time.
The embodiment of the present invention provides a kind of data content analytic method, comprising:
When the data content in the data inclusion is resolved, travel through successively each character that described data content comprises, determine ASCII value and ascii character that each character is corresponding;
According to the ASCII value of determining, ascii character and default coupling array, determine the matching value that each character is corresponding;
Matching value and the ascii character corresponding according to each character are determined the starting position of a field, and resolve a field;
Determine the starting position of binary content and the size of binary content according to the field after resolving, and resolve binary content.
The embodiment of the present invention provides a kind of data content resolver, comprising:
The first determining unit is used for traveling through successively each character that described data content comprises when the data content in the data inclusion is resolved, and determines ASCII value and ascii character that each character is corresponding;
The second determining unit is used for determining according to ASCII value, ascii character and the default coupling array determined the matching value that each character is corresponding;
The first resolution unit is used for matching value and the ascii character corresponding according to each character, determines the starting position of a field, and resolves a field;
The second resolution unit is used for determining the starting position of binary content and the size of binary content according to the field after resolving, and resolves binary content.
data content analytic method and device that the embodiment of the present invention provides, when the content in the data inclusion is resolved, each character in the ergodic data inclusion successively, and ASCII value and the ascii character of definite each character correspondence in the ASCII character table, according to default coupling array, the ASCII value that each character is corresponding is mated, obtain matching value corresponding to this character, and the ascii character corresponding according to this character and matching value are after determining to lift one's head the starting position of field, resolve a field, according to the field after resolving, just can determine starting position and its size of binary content, thereby can resolve binary content.Due to when the character in the data inclusion is resolved, only need from first to last the character that comprises in the data inclusion once to be traveled through, thereby reduced the time complexity that data content is resolved, reduced the parsing time.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from specification, perhaps understand by implementing the present invention.Purpose of the present invention and other advantages can realize and obtain by specifically noted structure in the specification of writing, claims and accompanying drawing.
Description of drawings
Fig. 1 is in the embodiment of the present invention, the implementing procedure schematic diagram of data content analytic method;
Fig. 2 is in the embodiment of the present invention, determines the implementing procedure schematic diagram of the matching value that arbitrary character is corresponding;
Fig. 3 is in the embodiment of the present invention, to the process of analysis schematic diagram of the data inclusion that comprises an image content;
Fig. 4 is in the embodiment of the present invention, a field process of analysis schematic diagram;
Fig. 5 is in the embodiment of the present invention, the structural representation of data content resolver.
Embodiment
For when the data content that server is issued is resolved, reduce and resolve time complexity, reduce the parsing time, the embodiment of the present invention provides a kind of data content analytic method and device.
Below in conjunction with Figure of description, the preferred embodiments of the present invention are described, be to be understood that, preferred embodiment described herein only is used for description and interpretation the present invention, be not intended to limit the present invention, and in the situation that do not conflict, embodiment and the feature in embodiment in the present invention can make up mutually.
By in prior art, the resolving of data content being analyzed as can be known, the key element of resolution data content when accurately locating the length of the starting position of binary content and binary content.Carefully analyze the MIME agreement, can find to adopt the data content of MIME protocol transmission to be ASCII character.From the angle of Context resolution, the represented content of these ASCII character can be divided into five classes: 1, NULL; 2, null character (NUL); 3, colon (i.e. ": "); 4, new line symbol (i.e. "/r " or "/n "), 5, the character (being all ascii characters except above-mentioned 4 class symbols in ASCII character) of denoting contents.Therefore, as long as be above 5 classes with the correct coupling of ASCII character, just can correctly resolve the data content.For example, data content starts with "--"+" boundary "+"/r/n ", judge whether character string corresponding to current location side-play amount is "--"+" boundary ", if, continue traversal backward, find that "/r/n " character belongs to new line, can determine that ensuing content is a field of data content, after correct field is processed, then continue to process binary content backward.
Based on above-mentioned analysis, for ascii character is mated, the embodiment of the present invention provides a kind of method for building up that mates array: matching value corresponding to NULL character in definition ASCII character table is the first matching value, matching value corresponding to definition null character (NUL) is the second matching value, matching value corresponding to definition colon is the 3rd matching value, matching value corresponding to definition new line symbol is the 4th matching value, and matching value corresponding to ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is the 5th matching value.
For the ease of understanding, take matching value corresponding to NULL character as 1, matching value corresponding to null character (NUL) is 2, the matching value that colon is corresponding is 4, the matching value that new line symbol is corresponding is 8, and the matching value that other ascii character is corresponding is 0 to be example, and the matching value that ascii character is corresponding is as shown in table 1:
Table 1
Ascii character Matching value
The NULL character 1
Null character (NUL) 2
Colon 4
The new line symbol 8
Other character 0
according to above-mentioned definition, the coupling array A that sets up according to the ASCII character table can be expressed as follows: A={1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .
Coupling array based on above-mentioned definition, the embodiment of the present invention provides a kind of data content analytic method, from the data content that position offset is zero, the character that comprises of ergodic data content successively, until traverse the end separator ("--"+" boundary "+"--") that representative finishes, whole resolving finishes.
As shown in Figure 1, the implementing procedure schematic diagram of the data content analytic method that provides for the embodiment of the present invention comprises the following steps:
S101, when the data content in the data inclusion is resolved, travel through successively each character that described data content comprises, determine ASCII value and ascii character that each character is corresponding;
ASCII value, ascii character and default coupling array that S102, basis are determined are determined the matching value that each character is corresponding;
S103, matching value and the ascii character corresponding according to each character are determined the starting position of a field, and resolve a field;
S104, determine the starting position of binary content and the size of binary content according to the field after resolving, and resolve binary content.
Wherein, as shown in Figure 2, in step 102, can determine the matching value that arbitrary character is corresponding according to following process:
S1021, for each character, judge that whether ASCII value corresponding to this character surpass preset value, if so, execution in step S1022, if not, execution in step S1023.
Concrete, the ASCII value (decimal system) corresponding due to the ASC character is no more than 127, and therefore, in the embodiment of the present invention, preset value can be set to 127.
S1022, determine that matching value corresponding to this character is the 5th matching value;
S1023, ascii character that this character is corresponding corresponding matching value in default coupling array is defined as matching value corresponding to this character.
As ": " as example, decimal system ASCII value corresponding to ": " is 58 take ascii character corresponding to a certain character, less than 127, therefore, according to the coupling array of setting up in the embodiment of the present invention, determines that matching value of ": " correspondence is the 3rd matching value due to 58.
In concrete enforcement, can determine according to following process the starting position of a field:
For data content corresponding to current location side-play amount, determine that this data content is default beginning separator, data content corresponding to described position offset is for to accord with by new line arbitrary line character that obtains of cutting apart; And
Determine that matching value corresponding to character after the beginning separator is the 4th matching value;
This character string is defined as the starting position of a field.
For the ease of understanding the present invention, below comprise an image content as example in the data inclusion, the embodiment of the present invention is described, suppose theing contents are as follows in data inclusion that server issues:
HTTP/1.1 200 OK
X-Powered-By:Servlet/2.5
Server:Sun Java System Application Server 9.1_02
X-DP-next URI:/content/refresh/
Content-Type:multipart/mixed;boundary=End
Content-Length:7479
Date:Tue,26 May 2009 01:57:34 GMT
Connection:Keep-Alive
--END
Content-Type:image/jpeg
Content-Transfer-Encoding:binary
Content-Id:0526090000018182
Content-Length:1764
* * * * * * * * (binary content) * * * * * * * * * * * * *
--END--
In the embodiment of the present invention, the beginning separator of the starting position of image content is "--END ", and the end separator of end position is "--END--".
As shown in Figure 3, in the embodiment of the present invention, the schematic flow sheet to the image content in the data inclusion that receives is resolved comprises the following steps:
S301, whether judge the current location side-play amount greater than 0, if so, execution in step S302, if not, execution in step S304;
S302, judge whether content belongs to new line symbol, if so, execution in step S303, if not, execution in step S304;
By the content of above-mentioned data inclusion as can be known, before image content begins separator, be new line symbol ("/r/n "), therefore, before the beginning separator of location image content, need to get rid of the new line symbol before the beginning separator.
S303, position offset add 1;
By the content of above-mentioned data inclusion as can be known, before image content begins separator, be new line symbol ("/r/n "), according to default coupling array, determine whether current character is the new line symbol, if, position offset is added 1, that is the character that the data content of next line comprises is traveled through, to filter out the new line symbol before the beginning separator, accurately locate the position of the beginning separator "--END " of image content.
S304, judge that whether data content corresponding to current location side-play amount be "--END " beginning, if so, execution in step S305, if not, execution in step S306;
Concrete, travel through successively each character that data content that the current location side-play amount points to comprises, and determine that whether character string that ascii character corresponding to each character form is beginning separator "--END ".
In data content corresponding to S305, judgement current location side-play amount, whether the data content of back is "--", if so, and execution in step S314, if not, execution in step S306;
Concrete, continue the character that traversal "--EN D " data content afterwards comprises, judge whether the character string that its corresponding ascii character forms is "--", with determine current matching to ascii string be not the character string that represents end position "--END--".
S306, judge that whether this data content is the new line symbol, if so, execution in step S308, if not, execution in step 307;
After having determined the starting position separator of image content, continue the data content of traversal back, and determine whether "/r/n ", if so, illustrate that new line accords with the content of back and is the starting position of a field.
S307, next data content of continuation traversal, and execution in step S306;
S308, position offset add 1;
A field of S309, parsing image content;
Concrete, a field process of analysis also adopts the mode of traversal, and from the beginning the position offset that begins of field begins, and accords with determining a certain wardrobe field contents and a field value according to ": " and new line.After certain delegation's parsing is complete, continue traversal parsing downwards.Wherein, ": " character string before is a field name, and the character string between ": " and new line symbol is a field value.
Whether S310, a judgement field parsing finish, if so, and execution in step S311, if not, execution in step S309;
Concrete, when finding that this row does not have analysable content, determine that a field finishes.
S311, determine the starting position of binary content and the size of binary content;
During concrete enforcement, binary content is to start with "--", namely whether the beginning character string of the next line data content of a judgement field end position is "--", if, be defined as the starting position of binary content, by resolving a field, can determine the size of binary content, for example, be 1764 in this example.
S312, parsing binary content;
S313, judge binary content whether resolve complete, if so, execution in step S314, if not, execution in step S312;
Concrete, by determining that character string is--the END--separator determines to resolve end, need to prove, if comprise a plurality of image contents in the data inclusion, by determining that character determines an image content is resolved end for "--END " separator, namely "--END " be the end decollator of a upper image content, be also the beginning separator of next image content simultaneously.
S314, be parsed.
In said process, by step S301~step S303, filter out beginning separator new line symbol before, begin the position of separator with accurate location; By step SS304~step S305, judge whether to traverse the end position of data content; Filter out new line symbol after "--END " separator by step S306~step S307, with the starting position of positioning head field; By step S308~step S310, resolve a field contents, and the size of binary content starting position, location and binary content; Step S311~step S313 resolves binary content.
In concrete enforcement, can be according to following process analysis head field:
Each wardrobe field data for a field comprises travels through each character that this wardrobe field data comprises successively, determines that respectively corresponding matching value is the character of the 3rd matching value and the 4th matching value;
The character string that the ascii character that character before the character that the 3rd matching value is corresponding is corresponding forms is defined as a field name of this wardrobe field data;
The character string that the ascii character that character between character corresponding to the character that the 3rd matching value is corresponding and the 4th matching value is corresponding forms is defined as a field value of this wardrobe field data.
As shown in Figure 4, be a field process of analysis schematic diagram, comprise the following steps:
S401, for a field data corresponding to current location side-play amount, travel through successively each character that this field data comprises, and judge whether this character is ": ", if so, execution in step S403, otherwise execution in step S402;
Concrete, by determining ascii character corresponding to this character, determine whether character is ": "
S402, next character of continuation traversal, and execution in step S401;
The name of S403, recording head field;
S404, judge whether character is space character, if so, execution in step S405, if not, execution in step S406;
S405, next character of continuation traversal, and execution in step S404;
The starting position of S406, mark head field value;
S407, judge that whether current character is the new line symbol, if so, execution in step S409, if not, execution in step S408;
S408, next character of continuation traversal, and execution in step S407;
The end position of S409, mark head field value, and obtain a field value.
In said process, by step S401~step S403, the position of location ": ", and obtain field name to the end; Step S404~step S406 gets rid of the front space character of a Related fields value; Step S407~step S409, the starting position of positioning head field value and end position, and obtain a field value.
Based on same inventive concept, a kind of data content resolver also is provided in the embodiment of the present invention, because the principle that this data content resolver is dealt with problems is similar to above-mentioned data content analytic method, therefore the enforcement of this data content resolver can referring to the enforcement of above-mentioned data content analytic method, repeat part and repeat no more.
As shown in Figure 5, the structural representation of the data content resolver that provides for the embodiment of the present invention comprises:
The first determining unit 501 is used for traveling through successively each character that described data content comprises when the data content in the data inclusion is resolved, and determines ASCII value and ascii character that each character is corresponding;
The second determining unit 502, be used for determining according to ASCII value, ascii character and the default coupling array determined the matching value that each character is corresponding;
The first resolution unit 503 is used for matching value and the ascii character corresponding according to each character, determines the starting position of a field, and resolves a field;
The second resolution unit 504 is used for determining the starting position of binary content and the size of binary content according to the field after resolving, and resolves binary content.
In concrete enforcement, the data content resolver can also comprise:
The coupling array is set up the unit, be used for setting up as follows the coupling array: the matching value corresponding to NULL character of definition ASCII character table is the first matching value, matching value corresponding to definition null character (NUL) is the second matching value, matching value corresponding to definition colon is the 3rd matching value, matching value corresponding to definition new line symbol is the 4th matching value, and matching value corresponding to ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is the 5th matching value.
Individual in concrete enforcement, the second determining unit 502 can comprise:
Judge module is used for for each character, judges whether ASCII value corresponding to this character surpasses preset value;
The first determination module is used for determining that matching value corresponding to this character is the 5th matching value when judgment result is that of described judge module is;
The second determination module is used at described judge module when the determination result is NO, and the ascii character that this character is corresponding corresponding matching value in default coupling array is defined as matching value corresponding to this character.
In concrete enforcement, the first resolution unit 503 can comprise:
The separator determination module is used for for data content corresponding to current location side-play amount, determines that this data content is default beginning separator, and the data content that described position offset is corresponding cuts apart for according with by new line the arbitrary line character that obtains;
The 3rd determination module is used for determining that matching value corresponding to character after the beginning separator is the 4th matching value;
The starting position determination module is for this character string being defined as the starting position of a field.
In concrete enforcement, the first resolution unit 503 can comprise:
The character determination module, the every wardrobe field data for comprising for a field travels through each character that this wardrobe field data comprises successively, determines that respectively corresponding matching value is the character of the 3rd matching value and the 4th matching value;
Field name determination module is used for the field name that character string that ascii character corresponding to character before the character that the 3rd matching value is corresponding form is defined as this wardrobe field data;
Field value determination module is used for the field value that character string that ascii character corresponding to character between character corresponding to the character that the 3rd matching value is corresponding and the 4th matching value form is defined as this wardrobe field data.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt complete hardware implementation example, implement software example or in conjunction with the form of the embodiment of software and hardware aspect fully.And the present invention can adopt the form that wherein includes the upper computer program of implementing of computer-usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) of computer usable program code one or more.
The present invention is that reference is described according to flow chart and/or the block diagram of method, equipment (system) and the computer program of the embodiment of the present invention.Should understand can be by the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or the combination of square frame.Can provide these computer program instructions to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, make the instruction of carrying out by the processor of computer or other programmable data processing device produce to be used for the device of realizing in the function of flow process of flow chart or a plurality of flow process and/or square frame of block diagram or a plurality of square frame appointments.
These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work, make the instruction that is stored in this computer-readable memory produce the manufacture that comprises command device, this command device is realized the function of appointment in flow process of flow chart or a plurality of flow process and/or square frame of block diagram or a plurality of square frame.
These computer program instructions also can be loaded on computer or other programmable data processing device, make on computer or other programmable devices and to carry out the sequence of operations step producing computer implemented processing, thereby be provided for realizing the step of the function of appointment in flow process of flow chart or a plurality of flow process and/or square frame of block diagram or a plurality of square frame in the instruction of carrying out on computer or other programmable devices.
Although described the preferred embodiments of the present invention, in a single day those skilled in the art get the basic creative concept of cicada, can make other change and modification to these embodiment.So claims are intended to all changes and the modification that are interpreted as comprising preferred embodiment and fall into the scope of the invention.
data content analytic method and device that the embodiment of the present invention provides, when the data content in the data inclusion is resolved, each character of comprising of ergodic data content successively, and ASCII value and the ascii character of definite each character correspondence in the ASCII character table, according to default coupling array, the ASCII value that each character is corresponding is mated, obtain matching value corresponding to this character, and the ascii character corresponding according to this character and matching value are after determining to lift one's head the starting position of field, resolve a field, according to the field after resolving, just can determine starting position and its size of binary content, thereby can resolve binary content.Due to when the data content in the data inclusion is resolved, the character that only needs from first to last the data content to be comprised once travels through, thereby has reduced the time complexity that data content is resolved, and has reduced the parsing time.
Obviously, those skilled in the art can carry out various changes and modification and not break away from the spirit and scope of the present invention the present invention.Like this, if within of the present invention these are revised and modification belongs to the scope of claim of the present invention and equivalent technologies thereof, the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. a data content analytic method, is characterized in that, comprising:
When the data content in the data inclusion is resolved, travel through successively each character that described data content comprises, determine ASCII value and ascii character that each character is corresponding;
According to the ASCII value of determining, ascii character and default coupling array, determine the matching value that each character is corresponding;
Matching value and the ascii character corresponding according to each character are determined the starting position of a field, and resolve a field;
Determine the starting position of binary content and the size of binary content according to the field after resolving, and resolve binary content.
2. the method for claim 1, is characterized in that, sets up as follows the coupling array:
Matching value corresponding to NULL character in definition ASCII character table is the first matching value, matching value corresponding to definition null character (NUL) is the second matching value, matching value corresponding to definition colon is the 3rd matching value, matching value corresponding to definition new line symbol is the 4th matching value, and matching value corresponding to ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is the 5th matching value.
3. method as claimed in claim 2, is characterized in that, according to the ASCII value of determining, ascii character and default coupling array, determines the matching value that each character is corresponding, specifically comprises:
For each character, judge whether ASCII value corresponding to this character surpasses preset value;
Determine that matching value corresponding to this character is the 5th matching value when being judgment result is that;
When the determination result is NO, the matching value of the ascii character that this character is corresponding correspondence in default coupling array was defined as matching value corresponding to this character.
4. method as claimed in claim 2, is characterized in that, matching value and the ascii character corresponding according to each character, and the starting position of a definite field specifically comprises:
For data content corresponding to current location side-play amount, determine that this data content is default beginning separator, data content corresponding to described position offset is for to accord with by new line arbitrary line character that obtains of cutting apart; And
Determine that matching value corresponding to character after described beginning separator is the 4th matching value;
Described character string is defined as the starting position of a field.
5. the described method of claim as arbitrary in claim 1~4, is characterized in that, resolves a field, specifically comprises:
Each wardrobe field data for a field comprises travels through each character that this wardrobe field data comprises successively, determines that respectively corresponding matching value is the character of the 3rd matching value and the 4th matching value;
The character string that the ascii character that character before the character that the 3rd matching value is corresponding is corresponding forms is defined as a field name of this wardrobe field data;
The character string that the ascii character that character between character corresponding to the character that the 3rd matching value is corresponding and the 4th matching value is corresponding forms is defined as a field value of this wardrobe field data.
6. a data content resolver, is characterized in that, comprising:
The first determining unit is used for traveling through successively each character that described data content comprises when the data content in the data inclusion is resolved, and determines ASCII value and ascii character that each character is corresponding;
The second determining unit is used for determining according to ASCII value, ascii character and the default coupling array determined the matching value that each character is corresponding;
The first resolution unit is used for matching value and the ascii character corresponding according to each character, determines the starting position of a field, and resolves a field;
The second resolution unit is used for determining the starting position of binary content and the size of binary content according to the field after resolving, and resolves binary content.
7. device as claimed in claim 6, is characterized in that, also comprises:
The coupling array is set up the unit, be used for setting up as follows the coupling array: the matching value corresponding to NULL character of definition ASCII character table is the first matching value, matching value corresponding to definition null character (NUL) is the second matching value, matching value corresponding to definition colon is the 3rd matching value, matching value corresponding to definition new line symbol is the 4th matching value, and matching value corresponding to ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is the 5th matching value.
8. device as claimed in claim 7, is characterized in that, described the second determining unit comprises:
Judge module is used for for each character, judges whether ASCII value corresponding to this character surpasses preset value;
The first determination module is used for determining that matching value corresponding to this character is the 5th matching value when judgment result is that of described judge module is;
The second determination module is used at described judge module when the determination result is NO, and the ascii character that this character is corresponding corresponding matching value in default coupling array is defined as matching value corresponding to this character.
9. device as claimed in claim 7, is characterized in that, described the first resolution unit comprises:
The separator determination module is used for for data content corresponding to current location side-play amount, determines that this data content is default beginning separator, and the data content that described position offset is corresponding cuts apart for according with by new line the arbitrary line character that obtains;
The 3rd determination module is used for determining that matching value corresponding to character after described beginning separator is the 4th matching value;
The starting position determination module is for described character string being defined as the starting position of a field.
10. the described device of claim as arbitrary in claim 6~9, is characterized in that, described the first resolution unit comprises:
The character determination module, each the wardrobe field data for comprising for a field travels through each character that this wardrobe field data comprises successively, determines that respectively corresponding matching value is the character of the 3rd matching value and the 4th matching value;
Field name determination module is used for the field name that character string that ascii character corresponding to character before the character that the 3rd matching value is corresponding form is defined as this wardrobe field data;
Field value determination module is used for the field value that character string that ascii character corresponding to character between character corresponding to the character that the 3rd matching value is corresponding and the 4th matching value form is defined as this wardrobe field data.
CN201110334808.XA 2011-10-28 2011-10-28 A kind of data content analytic method and device Active CN103095644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110334808.XA CN103095644B (en) 2011-10-28 2011-10-28 A kind of data content analytic method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110334808.XA CN103095644B (en) 2011-10-28 2011-10-28 A kind of data content analytic method and device

Publications (2)

Publication Number Publication Date
CN103095644A true CN103095644A (en) 2013-05-08
CN103095644B CN103095644B (en) 2015-10-07

Family

ID=48207788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110334808.XA Active CN103095644B (en) 2011-10-28 2011-10-28 A kind of data content analytic method and device

Country Status (1)

Country Link
CN (1) CN103095644B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572898A (en) * 2014-12-22 2015-04-29 上海钢富电子商务有限公司 Data analysis method and data analysis system for steel trade industry spot commodity resource
CN104767710A (en) * 2014-01-02 2015-07-08 中国科学院声学研究所 DFA (Determine Finite Automaton)-based transmission load extraction method for HTTP (Hyper Text Transfer Protocol) chunked transfer encoding
CN108021540A (en) * 2017-11-09 2018-05-11 中国科学院信息工程研究所 The analytic method and instrument of a kind of generic text form towards Hadoop
CN108055266A (en) * 2017-12-15 2018-05-18 南京邮电大学盐城大数据研究院有限公司 A kind of method and system of 8583 data message of parsing based on position offset

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004054275A2 (en) * 2002-12-11 2004-06-24 Nokia Corporation Downloading software applications
CN1852320A (en) * 2006-01-26 2006-10-25 华为技术有限公司 Signaling message detecting method and system based on text coding
CN101179769A (en) * 2007-12-04 2008-05-14 南京吉美思***集成有限公司 LBS position service based community rectification work management method
CN101227435A (en) * 2008-01-28 2008-07-23 浙江大学 Method for filtering Chinese junk mail based on Logistic regression
CN101345720A (en) * 2008-08-15 2009-01-14 浙江大学 Junk mail classification method based on partial match estimation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004054275A2 (en) * 2002-12-11 2004-06-24 Nokia Corporation Downloading software applications
WO2004054275A3 (en) * 2002-12-11 2004-08-12 Nokia Corp Downloading software applications
CN1852320A (en) * 2006-01-26 2006-10-25 华为技术有限公司 Signaling message detecting method and system based on text coding
CN101179769A (en) * 2007-12-04 2008-05-14 南京吉美思***集成有限公司 LBS position service based community rectification work management method
CN101227435A (en) * 2008-01-28 2008-07-23 浙江大学 Method for filtering Chinese junk mail based on Logistic regression
CN101345720A (en) * 2008-08-15 2009-01-14 浙江大学 Junk mail classification method based on partial match estimation

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104767710A (en) * 2014-01-02 2015-07-08 中国科学院声学研究所 DFA (Determine Finite Automaton)-based transmission load extraction method for HTTP (Hyper Text Transfer Protocol) chunked transfer encoding
CN104767710B (en) * 2014-01-02 2018-08-07 中国科学院声学研究所 The transmission payload extracting method of HTTP block transmissions coding based on DFA
CN104572898A (en) * 2014-12-22 2015-04-29 上海钢富电子商务有限公司 Data analysis method and data analysis system for steel trade industry spot commodity resource
CN104572898B (en) * 2014-12-22 2017-09-22 上海找钢网信息科技股份有限公司 The data analysis method and system of a kind of steel trade industry stock resource
CN108021540A (en) * 2017-11-09 2018-05-11 中国科学院信息工程研究所 The analytic method and instrument of a kind of generic text form towards Hadoop
CN108021540B (en) * 2017-11-09 2023-05-02 中国科学院信息工程研究所 Hadoop-oriented general text format analysis method and tool
CN108055266A (en) * 2017-12-15 2018-05-18 南京邮电大学盐城大数据研究院有限公司 A kind of method and system of 8583 data message of parsing based on position offset

Also Published As

Publication number Publication date
CN103095644B (en) 2015-10-07

Similar Documents

Publication Publication Date Title
KR101863981B1 (en) Using text messages to interact with spreadsheets
US20090210459A1 (en) Document synchronization solution
CN103139299B (en) Cloud service dispatching method between cloudy and system
EP3447631B1 (en) Writing trajectory synchronization method and system for multiple clients
CN103051646A (en) Information synchronization method and device
CN104580454A (en) Data synchronizing method, device and system
CN107465599A (en) Schedule method to set up and device in a kind of instant messaging
CN109492208B (en) Document editing method and device, equipment and storage medium thereof
CN102769640B (en) The update method of user profile, server and system
CN114205665B (en) Information processing method, device, electronic equipment and storage medium
CN104077294A (en) Information recommendation method, information recommendation device and information resource recommendation system
CN103178998A (en) Test and control data transmission method and device
CN103095644A (en) Data content analytic method and data content analytic device
CN102769687A (en) Mass texting system and method
CN103186991A (en) Electronic book reading system capable of achieving multi-equipment seamless reading and equipment switching method
CN101980504B (en) Data sharing method, mobile terminal and server
CN114780519A (en) DBC file generation method, device, equipment and medium based on CAN communication
CN103078782A (en) Friend note recommendation note implementation method and system
CN111641690B (en) Session message processing method and device and electronic equipment
CN105376134B (en) Method and device for displaying communication message
CN104281620A (en) Contact person data query method and device supporting polyphones
CN109165259B (en) Index table updating method based on network attached storage, processor and storage device
EP2814211A1 (en) Method and device for displaying conversation information
CN103853347A (en) Information release method based on gravity sensing horizontal screen display of mobile terminal
CN102314471A (en) Method for synchronizing scroll bars

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant