CN108763413B

CN108763413B - Data searching and positioning method based on data storage format

Info

Publication number: CN108763413B
Application number: CN201810500445.4A
Authority: CN
Inventors: 张昭
Original assignee: Tangshan High Tech Industrial Park Xingrong Technology Co ltd
Current assignee: Tangshan High Tech Industrial Park Xingrong Technology Co ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2021-07-23
Anticipated expiration: 2038-05-23
Also published as: CN108763413A

Abstract

The invention provides a data searching and positioning method based on a data storage format, wherein the data storage format comprises an operation function library, a console instruction set, a tree structure description area and a data table area; the tree structure description area is positioned in front of the data table area, the tree structure description area and the data table area are both composed of a plurality of data points, a first data point of the tree structure description area represents the folder level, and a second data point represents the folder name; the data table area is formed by storing data points in a table form; the data points include a jump help area and an active data area, the active data area being represented by English, Chinese, binary characters, or a combination thereof; the jump help area describes the width of the effective data area, and when the data is positioned, the byte number of the corresponding width is jumped according to the read value of the jump help area to extract the corresponding data.

Description

Data searching and positioning method based on data storage format

Technical Field

The invention belongs to the data storage technology, and particularly relates to a data searching and positioning method based on a data storage format.

Background

The data storage format system is an important component of an internet industrial control system, the existing mainstream identification language is xml language, and the code amount of the existing coding language is large, so that the searching speed is low when a program runs, and the consumption of cpu calculation, hard disk storage and network resources is high. With the gradual increase of the requirements of people on automation control, the code quantity is larger, programming is difficult to maintain, a singlechip with lower performance is difficult to load with so much information, production is a great obstacle, and the existing XML code searching speed is slow.

Because the program code amount of html and xml is large, the program is difficult to maintain, the hardware configuration of a single chip microcomputer is high, but the capital investment for replacing high-configuration equipment is very large, and the existing mainstream programming language causes great obstacles for the original equipment to pursue efficient and multifunctional control. The real-time requirement can hardly be realized on the existing equipment with lower configuration, the xml algorithm is complex, and the application of various digital equipment drivers such as the problem of write escape substitution, read escape recovery and the like is involved. The protocol of a self-defined communication layer and the intercommunication are obstructed, the computer end of a driver device end is not standard, and the difficulty of accessing PC equipment by a single chip microcomputer is serious.

Disclosure of Invention

The invention aims to solve the problems and provides a new data storage grid, which has less file code amount, is beneficial to reading files, is beneficial to directly positioning, reading and analyzing a single chip microcomputer to obtain each data point, and avoids searching type reading.

The invention solves the problems and adopts the technical scheme that:

a data searching and positioning method based on a data storage format is disclosed, wherein the data format comprises an operation function library, a console instruction set, a tree structure description area and a data table area;

the tree structure description area is located in front of the data table area, and the structure of the tree structure description area is as follows: start symbol/first data point/second data point;

the start symbol is a mark symbol for describing the start of the folder in the tree structure description area and is a double reverse slash \ \ encoded by ansi gbk;

the first data point represents a folder level; the second data point expresses the name of the folder;

the data table area is positioned behind a second data point of the tree structure description area, the data table area is formed by storing the data points in a table form, and a first data point of the data table area expresses the column number of the data points under the folder;

the data points include a jump help area and an active data area, the active data area being represented by English, Chinese, binary characters, or a combination thereof; the jump help area is the description of the width of the data in the effective data area which follows the jump help area, the jump help area is expressed by decimal integer character strings, the character string codes obey the GBK coding rule, and the conventional character strings are expressed by 00-99; the ultralong character string is represented by [. x. ] where "x" is a decimal number of any two or more digits.

When a folder is located, the starting point of the folder spans \ \ then the first data point and the second data point, and the data table begins after the second data point.

If the folder has a lower folder, the ending position of the data table is the position of the stored byte immediately preceding the starting symbol position of the lower folder.

If the folder has no lower folder, the end of the data table is the position of the storage byte before the starting symbol of the peer folder behind the folder;

if the folder has no lower folder, and at the same time, there is no peer folder behind the folder, and at the same time, there is no upper folder behind the folder, the end of the data table is the last byte of the file with stored characters;

the process of inquiring the data table is to traverse each row of the data table, if the first data point of the data row is the same as the searched data, the search is judged to be finished, and all the data of the row are extracted.

Compared with the prior art, the invention adopting the technical scheme has the beneficial effects that: the invention provides a jump help jumpheler for the width record of each name or numerical value, which is beneficial to the direct positioning reading and analysis of a single chip microcomputer, avoids the searching reading, reduces the difficulty and improves the speed. The electric energy is effectively saved, the CPU analysis power consumption is reduced, and the method is a more environment-friendly data protocol. The system can help the single chip microcomputer to access the Internet and PC equipment xrb-based-WDM-driver, and the single chip microcomputer without an operating system can access a large database and the Internet.

Further, the folder with the folder level 0 is a root folder, and the root folder is a simplified folder with a partially hidden beginning.

In traversing the data points, the reading process of the data points is:

a. the data description area is read and the double backslash \ \ is identified as the beginning of the folder.

b. Reading the first data point of the double backslash \ \ rear expression folder level, firstly reading the jump help area, jumping downwards by the corresponding width according to the value of the jump help area, reading out the characters in the width range, and identifying the level of the folder.

c. Reading the second data point expressing the folder name, firstly reading the jump help area, jumping downwards by the corresponding width according to the value of the jump help area, reading out the characters in the range, and identifying the folder name needing to be searched.

d. Reading the data table area, firstly reading the first data point of the data expression data table column number, firstly reading the jump help area of the data point, jumping downwards by the corresponding width according to the value of the jump help area, reading the value in the width range, and recording as the data table column number.

e. Reading all data points from the first data point to the next double backswing \ \ mark, reading all effective contents of the folder according to the method for reading the data points, wherein the number of the read data points divided by the number of columns is the number of the rows of the data table.

Detailed Description

The invention relates to a data point forming file system containing jump information, which comprises the design of data points, the design of a hierarchical structure folder and the design of a data table.

The invention provides a data storage format, which comprises an operation function library (function lib), a console instruction set (query language), a tree structure description area (folder description) and a data table area (data table). This new data storage format may be referred to as "xing rong byte", hereinafter abbreviated as XRB.

The tree description area (folder description) is located before the data table area, and is used to describe the data table area. The tree structure description has a folder start symbol and two data points within it.

The structure of the tree structure description area is as follows: a start symbol/first data point/second data point, where the start symbol is a folder start marker, double backslash \ \ encoded for ansi gbk; the first data point represents a folder level; the second data point represents a folder name.

The data table area is located behind the second data point in the tree structure description area and is formed by storing the data points in a table form, and the first data point in the data table area expresses the column number of the data points under the folder.

An essential element constituting the tree structure description area (folder description) and the data table (data) is a data point (xrb data).

The structure of the data points is: the front is a jump help area which can be read visually, and the back is a full code arbitrary code infinite code effective data area. Each combination of the jump assist area and the effective data area is one data point.

The jump assist area is a description of the width of data in the effective data area immediately after it. The jump help region is represented by a decimal integer character string, the character string coding complies with the GBK coding rule, the regular character string is represented by 00-99, and the ultra-long character string is represented by [. X.. X. ] where ". X.. X." is a decimal number of any two or more digits. The data of the jump help area is generally defaulted to any two table numbers in the range from 00 to 99, when the byte data occupied by the effective character area is a single digit, the single digit is complemented by 0, and a two-digit decimal number is enough for the common data. When the number of jump help areas is 100 or more (i.e., three and higher decimal places), recording is performed. Byte-to-string is the ascii gbk encoding scheme. The string sense is a decimal integer that would be:

a. - [ [16777216] ]. - [ [4294967296] ]. - ] is. The jump assist area is at least two bits, and at most theoretically unlimited. If the numerical width is less than two bits, then one 0 is added in front.

Under the GBK specification, a Chinese character occupies at least 2 bytes to be stored, and an English letter and an Arabic number occupy 1 byte. Therefore, the 04 china 05china is in accordance with the xrb data point grammar specification.

(note: 65536 corresponds to a theoretical maximum value for short int type variable overflow values in 64K,256 x 256, c; 4294967296 corresponds to a theoretical maximum value for int type overflow values in 4G, 256 x 256 c; 18446744073709551616 corresponds to a theoretical maximum value for long int type variable overflow values in 16M T, c) (the number of 18446744073709551616 is about 1844 billion, and a xrb data point can hold data in such a large number of bytes, as well as in a larger number than that, which is about six million 1T disks in capacity.)

The valid data area may be english, chinese, binary byte data or a combination thereof.

The structure of the tree structure description area is as follows: "start marker/first data point (jump help area + valid data area)/second data point (jump help area + valid data area)", wherein the content of the first data point valid data area is at folder level, and its content may be 1, 2,3, or to infinity; the contents of the second data point valid data area are folder names, which are defined names, which may be strings of numbers, letters, characters, or a combination thereof.

The first data point in the data table field represents the number of columns in the data table, and the contents of the valid data field for the first data point are typically numbers. The other data points are the content information of the data table, and the content of the effective data area can be characters, documents and pictures, and can also be a function, an instruction and the like.

The width of the data point is the number of bytes occupied by the data, each letter occupies one byte, and each Chinese character occupies two bytes.

The data point may be empty. The jump help area and the data table area appear as if the data point is empty: its jump assist area is 00 and there is no valid data area behind the jump assist area. That is, only the jump helper "00" is followed by no valid meaningful bytes belonging to that data point, but the start or \ \ of the jump help area for the next data point.

The FSI of a folder F at a certain level L is the bound scope of that folder F with the FSI of the next neighbor folder at the same level. Within this range is first a data table (L +1 level) and then a lower folder (L +1 level). The level of the data table is defined as needed for the certainty of the xrb function library programming.

By analogy, each subordinate (L +1 level) folder may continue to own its own subordinate (L +2 level) data table and its own subordinate (L +2) folder.

The method for acquiring the end position of the directly-subordinate L + 1-level data table within the defined range of the L-level folder F comprises the following steps: if this folder F does not contain a lower folder, the end position of the table of F's direct is before the FSI of the next neighbor folder of F's sibling (L-level): if the folder F contains a lower folder, the determination method is verified before the end position of the table of the direct family of F is the FSI of the first lower (L + 1) folder of F.

The starting position of the directly subordinate L + 1-level data table within the range defined by the L-level folder F.

The acquisition method comprises the following steps: the FSI of this folder F then spans two data points DP, which are the data tables.

The data searching and positioning method of the data storage format comprises the following steps:

If the folder has no subordinate folders, the end of the data table is the location of the storage byte before the start symbol of the peer folder after the folder.

If the folder has no lower folder and no peer folder behind the folder, the end of the data table is the last byte of the file with stored characters at the tail.

The process of the line-searching data table is that each line of the data table is traversed, if the first data point of the data line is the same as the searched data, the search is judged to be finished, and all the data of the line is extracted. The values are stored in arrays seekednamevalue [1], seekednamevalue [2], … … seekednamevalue [ N ], where N is the number of columns in the table.

In traversing the data points, the data point reading process is:

a. reading the data description area, and identifying double backslashes \ \ recording as the beginning of a folder;

b. reading a first data point of a double backslash \ \ rear expression folder level, firstly reading a jump help area, jumping downwards by a corresponding width according to the value of the jump help area, reading characters in the width range, and identifying the level of the folder;

c. reading a second data point expressing the folder name, firstly reading a jump help area of the second data point, jumping downwards by a corresponding width according to the value of the jump help area, reading out characters in the range, and identifying the folder name needing to be searched;

d. continuing to read the subsequent data table area, firstly reading the first data point of the data expression data table column number, firstly reading the jump help area of the data point, jumping downwards by the corresponding width according to the value of the jump help area, reading the value in the width range, and recording as the data table column number;

e. reading the data points from the first data point to the next double backstepping \ \ mark, where the number of data points read divided by the number of columns is the number of rows in the data table;

reading out all effective contents of the folder by the method of reading the number points.

For example one, the folder description statement:

\\01207windows

wherein \ \ represents a folder start marker. The first data point "012", 2 is the valid content character field, level 2 folder, the width of the valid content character field of this data point is 1 byte, 01 is the jump help field. Thus, the value of this jump helper is 01. The second data point "07 windows" indicates that the directory folder name is windows, the valid data width of the valid character area with the data point is 7, and thus its jump assist is 07.

The first folder you see must be level 1 root is level 0, hidden from view. All you see is in the root folder. It can be considered that \ \01004root is hidden.

Example two, using a table described at xrb:

01204name06button05color03red

the meaning of this description language is: data points 012 illustrate that: the width of the table is 2 (2 data points per row of the table);

the first row of the table 04name06button means name button;

the second row 05color03red of the table means color red.

Example three, data table description:

example of a design comparison of a data sheet is a 3 row by 4 column array named myarray

1 2 3 4

5 6 7 8

9 10 11 12

The above array is described in Xml as follows:

</pb>

</myarray>

</root>

the above array is described by xrb as follows

012\\01107myarray012

\\0202pb

014

011012013014

015016017018

019021002110212

The xrb system is root name hidden and is more efficient. The folder description area is followed by a data table area, and the first data point "014" is the column number information M of the table. This is followed by a total of N x M data point sequences of N rows and M columns.

Comparing the XML description to the xrb description, results in:

effective byte number of payload bytes:

(A) xrb is: 1 myarray 2 pb 123456789101112224 totaled 29 bytes.

(B) The expression of Xml for the same data table uses 164 bytes. Total bytes total number xrb 70 bytes, and xml has 164 bytes.

Effect percentage of efficiency:

xml 29/164=18% ，

xrb 29/70=41%，

the gap, 41% -18% =23%. xrb increased efficiency by 23% over xml.

The significance is that if the speed is increased by 10%, 10% of electricity fee is saved, 10% of hard disk storage is reduced, if the super-computation center charges 30-60 ten thousand yuan of electricity per day (about 17.8MW power), 1-2 million yuan of electricity fee is saved every year, and 10% of saving means 1000-2000 ten thousand yuan.

xml acquires data points process xml requires < a > and </a > as the start and end of a piece of code. Read three times the decision < a > knowing that the data point starts. If the starting and ending mark characters of the data point in a segment of code have 10000 characters, 10000 times of character reading are required for 10000 times of judgment. The reading of these characters is very heavy in aggregate, rendering the processor very inefficient. Data points are obtained preliminarily, escape recovery is carried out, 10000 times of character reading are carried out continuously, whether an escape symbol exists or not is judged, and if the escape symbol exists, operation is recovered.

xrb data point obtaining process, namely, a data point with 10000 words is read similarly, the jump assistant is [010000], the process of analyzing the jump assistant only needs to be read 5 times to obtain "[", "01", "00", "J ]", and the judgment is "]", which indicates that the jump assistant is finished. Truncate the character string between "[" and "] ]" to get 010000, make 6 copies of the character, decimal string to computer integer. 6 characters, each ASCII value minus 48, multiplied by its own weight are: 100000, 10000, 1000, 100, 10, 1, sum, 6 subtractions in total, 6 multiplications, 6 additions.

Through the comparison between the existing xml and xrb provided by the invention, the xml reads the character 3 times, judges the character 3 times, reads the character 10000 times, recovers the escape from the escape, and recovers the escape from the escape 10000 times. And xrb 5 reads the character 5 times, judges the character 6 times, copies of the character 6 times, 6 subtractions, 6 multiplications, and 6 additions. Describing the same data point, the processing speed of xrb is significantly faster than the xml speed.

The xrb data storage format web page application embodiment:

012

\\01112myguideclare

013

10MYTEXTBOX102*107textbox

\\01112myguiinitial

013

10MYTEXTBOX101110helloworld

08MYLABEL101104home

\\01111myguilayout

017

10MYTEXTBOX101104504004196004197504149508bymyself

08MYLABEL1011045040042160039750349513newlinecenter

\\01112myguimission

015

08MYLABEL101115inside:web1.xrb02no02no

an example of a 4-dimensional matrix storage in the Xrb data storage format is dim myrray (2,3,4,5) as string

012

\\01107myarray

013

05array072,3,4,528comment:to time add reverse

\\012041,1,

015

011011011011011

\\012041,2,

015

011011011011011

\\012041,3,

015

011011011011011

\\012042,1,

015

011011011011011

\\012042,2,

015

011011011011011

\\012042,3,

015

011011011011011

The data storage format provided by the invention comprises a full system storage file suffix name of a folder and a data table of xrb, and a simplified system storage file suffix name of bxr which does not comprise a plurality of self-defined data points of the folder and the data table.

The technical scheme of the invention has the prominent characteristics that:

the advantages are that: the problem that a special function character special for a protocol appears in the effective data string and needs to be transferred is avoided. The servo area only uses 0123456789 character [ ] ] specific function symbol and/specific function symbol, reads jump step number (converted into integer according to decimal character string) or obtains folder start, jump mode can ensure to obtain effective data without entering effective data area, data direct transmission, escape zero dependence;

secondly, the human-free tool can read and write, and a notepad xrb file in a readable and writable text format is used;

packaging efficiency: xrb format has very little servo data, and the payload/total percentage of valid bytes is high, 23% higher than xml.

Through tests: xrb format packing efficiency, payload/total.

Ultra small data points (data point 1 byte, 30 data points) 20% -32% packing efficiency.

Small data points (data points 3 bytes-20 data points) 67% -73% packing efficiency.

The medium data point (hundreds of bytes of data 1 data point) is 96% packing efficiency.

Large data points (1 data point in kilobytes of data) 99% packing efficiency.

Means that: if the speed is increased by 10%, the electricity charge is saved by 10%, and the hard disk storage is reduced by 10%. If the super-computation center charges 30-60 ten thousand dollars of electricity per day (about 17.8MW power), 1-2 million dollars of electricity per year, 10% savings mean 1000-2000 dollars.

Claims

1. A data searching and positioning method based on a data storage format is characterized in that: the data storage format comprises an operation function library, a console instruction set, a tree structure description area and a data table area;

the data points include a jump help area and an active data area, the active data area being represented by English, Chinese, binary characters, or a combination thereof; the jump help area is the description of the width of the data in the effective data area which follows the jump help area, the jump help area is expressed by decimal integer character strings, the character string codes obey the GBK coding rule, and the conventional character strings are expressed by 00-99; the ultralong character string is represented by [. x. ] where "is a decimal number of any two or more digits;

when the folder is positioned, the starting point of the folder spans over \ \ and then spans over the first data point and the second data point, and the data table begins after the second data point;

if the file folder has a lower folder, the ending position of the data table is the position of a storage byte before the starting symbol position of the lower folder;

2. The data searching and positioning method based on the data storage format as claimed in claim 1, wherein: the folder with the folder level 0 is a root folder, and the root folder is a simplified folder with a hidden beginning part.

3. The data searching and positioning method based on the data storage format as claimed in claim 2, wherein: in traversing the data points, the reading of the data points is,

d. continuing to read the subsequent data table area, firstly reading the first data point of the data table area data table column number, firstly reading the jump help area of the data point, jumping downwards by the corresponding width according to the value of the jump help area, reading the value in the width range, and recording as the data table column number;

4. A data search and location method based on data storage format according to claim 2 or 3, characterized in that: the tree structure description area of the description part of the root folder with the folder of level 0 at the beginning is hidden and unreadable, and the searched folder starts from the level 1 folder.