CN109753285B - XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array) - Google Patents

XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array) Download PDF

Info

Publication number
CN109753285B
CN109753285B CN201811600605.9A CN201811600605A CN109753285B CN 109753285 B CN109753285 B CN 109753285B CN 201811600605 A CN201811600605 A CN 201811600605A CN 109753285 B CN109753285 B CN 109753285B
Authority
CN
China
Prior art keywords
xml
dom tree
analyzed
memory
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811600605.9A
Other languages
Chinese (zh)
Other versions
CN109753285A (en
Inventor
姜晓红
潘哲
吴健
尹建伟
邓水光
李莹
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201811600605.9A priority Critical patent/CN109753285B/en
Publication of CN109753285A publication Critical patent/CN109753285A/en
Application granted granted Critical
Publication of CN109753285B publication Critical patent/CN109753285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses an XML parser based on an FPGA, which comprises an FPGA-based software and hardware system for parsing an XML file, wherein the software and hardware system is used for downloading the XML file to be parsed and a memory head address of a continuous memory from a transmission network; the software and hardware system is also used for analyzing the received XML file to be analyzed, encapsulating the analyzed data, and writing the encapsulated analyzed data into the continuous memory appointed by the PC end according to the memory head address of the continuous memory. The XML parser can achieve high-efficiency parsing performance, and the practicability of the parser is guaranteed.

Description

XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array)
Technical Field
The invention relates to the field of computer reconfigurable acceleration, in particular to an XML parser based on an FPGA and a reconfigurable computing system.
Background
Reconfigurable computing is a new computing model, which is favored in the acceleration field. Compared with a general processor CPU and the like, the reconfigurable computation has the advantage of being customizable, and can realize an acceleration effect for an application customization circuit; compared with an application specific integrated circuit ASIC, the reconfigurable computation can realize internal circuit reconfiguration according to the requirement and has more flexibility. The FPGA, i.e. the field programmable logic array, is very a reconfigurable hardware, is essentially an architecture without instructions and shared memory, and can better accelerate some computation-intensive tasks.
In the world of network transport, XML is increasingly becoming a common language for web services, where XML parsing is a core task for XML-based applications. The tree-based DOM model, i.e. the text object model, is the standard model of the W3C recommendation. The DOM parsing process requires that the entire text be read into memory for parsing, which slows down when larger XML documents are encountered.
CPB (cycle per byte), the number of cycles required to parse each byte, is an important criterion for the performance evaluation of an XML parser. In the existing XML resolvers, the XML resolvers for accelerating software implementation are all based on a general CPU, the performance is limited by the processing performance of the CPU, the scheduling policy of an operating system and the like, and the effect is far less than that of hardware resolution due to the serialization of the XML resolvers.
The hardware implementation acceleration XML parser is mostly based on FPGA implementation. For the current research situation, a high-performance XML parser for integrated grammar checking, semantic checking, DOM tree construction that exceeds the 1CPB limit has been implemented. However, in the implementation process, all the texts to be parsed, including the parsed content, are stored on the FPGA, which is independent with respect to the outside, and the actual XML application is in the software layer, so that it is not beneficial to put into practical application.
The application publication CN104267998A discloses a hardware XML parser that appears based on a sliding window. Comprising the following steps: the system comprises an initialization storage unit, an inter-stack register, a data transmitting module for generating a sliding window, a lexical analysis module for extracting a character stream of an XML document as a token, a format checking module for checking the token of the XML document by adopting XML grammar rules and an XML document tree construction module, wherein: the initialization storage unit is connected with the data transmitting module and transmits initialization information of the XML document, the data transmitting module is connected with the lexical analysis module and transmits sliding window information, the lexical analysis module is connected with the format checking module and the XML document tree construction module and transmits token information, the XML document tree construction module is connected with the memory and transmits XML document tree information, and the inter-stack register for improving throughput rate is arranged between the connected modules. The content analyzed by the hardware XML analyzer is stored on hardware, and the actual XML file is not beneficial to being put into practical application.
Disclosure of Invention
The invention aims to provide an XML parser based on an FPGA, which specifically adopts a design idea of collaborative programming of software and hardware, simultaneously provides a parsing interface on the software and the hardware, transmits an XML text to the FPGA for processing through a network, and then returns data content which can be directly identified by the software by the FPGA, so that the whole XML text is parsed in the flow and put into application, and the practicability of the parser is ensured while the XML high-efficiency parsing performance is realized.
It is another object of the present invention to provide a reconfigurable computing system comprising an FPGA-based XML parser that enables fast computing tasks.
In order to achieve the above object, the present invention provides the following technical solutions:
an FPGA-based XML parser comprises an FPGA-based software and hardware system for parsing XML files, wherein the software and hardware system is used for downloading XML files to be parsed and memory head addresses of a continuous memory from a transmission network;
the software and hardware system is also used for analyzing the received XML file to be analyzed, encapsulating the analyzed data, and writing the encapsulated analyzed data into the continuous memory appointed by the PC end according to the memory head address of the continuous memory.
Preferably, the software and hardware system includes:
the input/output module is used for downloading an XML file to be analyzed and a memory head address of a continuous memory, forwarding the XML file to be analyzed to the grammar analysis module, packaging the constructed DOM tree, and controlling the packaged DOM tree to be written into the continuous memory appointed by the PC end according to the memory head address of the continuous memory;
the grammar analysis module is used for performing lexical examination, element nesting relation examination and attribute name uniqueness examination on the received XML file to be analyzed, and generating state information as a control signal of the DOM tree construction module;
the DOM tree construction module is used for analyzing the received XML file to be analyzed under the control of the control signal, and constructing the DOM tree of the XML file to be analyzed according to the analysis result.
Wherein, the input/output module includes:
the input buffer area is used for transmitting the downloaded XML file to be analyzed to the grammar analysis module in an AXI4stream form;
and the output buffer area is used for packaging the constructed DOM tree and controlling the packaged DOM tree to be written into the continuous memory appointed by the PC end.
The grammar parsing module comprises:
the lexical examination module is used for extracting the marks of the XML text to be analyzed, and also comprises a state machine of the XML lexical, and the state machine is driven by the XML text data stream to generate a state which is used as a control signal of the DOM tree construction module;
the element nesting relation checking module is used for checking whether the opening and closing label of the XML file to be analyzed has a strict matching relation or not;
and the attribute name uniqueness checking module is used for checking whether all attribute names of each element in the XML file to be analyzed are different.
Further, a bloom filter is employed to implement attribute name uniqueness checking.
Specifically, the DOM tree construction module includes:
the controller is controlled by the control signal generated by the grammar analysis module to analyze the XML file to be analyzed into element nodes, attribute nodes and content nodes;
a content node memory controlled by the controller to automatically store content nodes;
an attribute node memory for automatically storing attribute nodes under the control of the controller;
an element node memory, controlled by the controller, for automatically storing element nodes;
the node stack is used for ensuring the correct construction of the DOM tree;
under the control of the control signal generated by the grammar analysis module, the controller automatically arranges the content node memory, the attribute node memory and the element nodes, the attribute nodes and the content nodes in the element node memory according to a fixed format to form a DOM tree.
A reconfigurable computing system, comprising:
at least one XML analyzer based on the FPGA, wherein the XML analyzer acquires an XML file to be analyzed from a network, analyzes the XML file to be analyzed, generates a DOM tree, and writes the DOM tree into a continuous memory appointed by a PC end;
and the PC end is in communication connection with the XML parser, and loads and applies data contained in the DOM tree from a continuous memory appointed by the PC end.
In another fact mode, the reconfigurable computing system further comprises a router, the XML file to be parsed in the network is routed to the XML parser through the router, and the DOM tree obtained through parsing by the XML parser is routed to the PC end through the router.
Compared with the prior art, the invention has the following beneficial effects:
compared with other XML resolvers of the same type, the XML resolvers have high efficiency of hardware resolution, are directly associated with client software applications, and can be directly accepted by software, so that the XML resolvers have good practicability and expandability. Meanwhile, the design mode of the software and hardware collaborative programming can be applied to other fields of reconfigurable computation, and the problem that the hardware acceleration research is disjointed with the practical application is solved.
The reconfigurable computing system provided by the invention can realize rapid computing tasks because the reconfigurable computing system comprises the XML parser capable of rapidly parsing XML files.
Drawings
FIG. 1 is a schematic diagram of the hardware architecture of an FPGA-based XML parser provided by an embodiment;
FIG. 2 is a schematic diagram of a reconfigurable computing system provided by an embodiment;
FIG. 3 is a schematic diagram of another reconfigurable computing system provided by an embodiment;
FIG. 4 is a schematic diagram of a reconfigurable computing system under multitasking conditions provided by an embodiment;
fig. 5 is a schematic diagram of a storage structure of an element node, an attribute node, and a content node provided by the embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
As shown in FIG. 1, an embodiment provides an FPGA-based XML parser that includes two parts, a hardware part and a software part. The hardware part is borne by FPGA hardware and is responsible for the analysis task of the XML file, and the software part is accessed to normal application based on XML analysis. A network high-speed communication channel is adopted between the hardware and the software. The XML text file is sent to the FPGA end through software, the FPGA end analyzes the XML text file into a part which can be identified according to the requirement of XML analysis application software of the PC end, the part is transmitted back to the PC end through a network after the part is completed, the part is written into a memory, and the software can directly call the content of the memory.
Specifically, the software part sends the XML text to be analyzed and the memory head address of a continuous memory to the FPGA through the network. After the hardware analysis is finished, the data returned by the hardware through the network are sequentially written into the memory, and the data in the analyzed memory can be directly and normally used in the XML-based application of the PC end.
Specifically, the FPGA of the hardware part specifically includes an input-output module (IO module), a syntax parsing module (well-formed stacking module), and a DOM tree building module (DOM builder module). The input and output module is responsible for data interaction with the PC end, and the grammar analysis module is responsible for carrying out preliminary analysis on the input XML text and converting the XML text into a state machine, and further driving the subsequent DOM tree construction process. The DOM tree construction process is a process of parsing XML text into computer-recognizable memory data.
The IO module is an interface part which interacts with the PC end and performs high-speed communication through an optical fiber network interface. The IO module is also provided with a buffer zone, and the XML text file transmitted from the PC end is transmitted to the grammar analysis module in an AXI4stream mode. After analysis, the content of the corresponding memory on the FPGA chip is scanned out in turn, and is transmitted to the IO module in the form of AXI4stream data stream, and then the data packet is packaged into a data packet, and the data packet is returned to the PC through the optical fiber interface network.
The main contents realized by the grammar analysis module are lexical examination, element nesting relation examination and attribute name uniqueness examination. Lexical examination is to process an input stream of XML text, extract the tags, drive a state machine (scanner) containing the XML lexical with the input stream of XML text data, and generate a state (state) as the control signal for the next processing unit. The nesting relationship of the elements means that the open and close tags of the XML have strict matching relationship. The elements themselves may be nested, which requires the use of a stack (stack) to implement. The uniqueness check of attribute names refers to whether all attribute names of each element are different, which here needs to be implemented using Bloom filter technology.
The main content realized by the DOM tree construction module is to record data into a specified memory, and the process relates to the storage structure design. By analyzing the content after XML analysis, abstracting the result obtained after XML analysis into three data structures and storing the three data structures: element node, attribute node, content node. The three types of nodes are respectively stored in three respective continuous storage structures, and are realized through three hardware storage modules Content memory, attribute memory and Node memory at the FPGA end. The three types of nodes store data and address information which can be directly identified by PC end software, and the generated storage structure relationship is shown in figure 5. On the basis of the stored data structure, the automatic generation of the DOM tree can be realized by adding a controller and checking the generated state according to the lexical method. Meanwhile, a Node stack (Node stack) is used to ensure the correct construction of the grammar tree. The generated DOM tree is sent to a corresponding storage module for temporary storage.
The embodiment also provides a reconfigurable computing system, namely generally, an application scene of the XML parser, as shown in fig. 2, wherein the reconfigurable computing system comprises the XML parser based on the FPGA, and further comprises a PC end, and the PC end is in communication connection with the XML parser. When the method is applied, a program is required to be written into the FPGA board in advance, and the FPGA and the PC terminal are interconnected through a network interface. At this time, the XML application program at the PC end acquires an XML text file from the network and forwards the XML text file to the FPGA for processing. After the FPGA end analyzes, the memory containing the DOM tree is written back to the appointed memory position of the PC end and applied, and the function of XML hardware analysis software application is realized.
An embodiment also provides another reconfigurable computing system, as shown in fig. 3, where the reconfigurable computing system includes an XML parser based on the FPGA, and further includes a PC end and a router, where the FPGA has a corresponding network port address as a peripheral device, as shown in fig. 3. Under the application condition, a program needs to be written into an FPGA board in advance, and the FPGA and the PC are respectively interconnected with a router connected with the Internet. XML text files in XML application programs in the network are directly sent to the hardware analyzer by the router, after analysis, the data are forwarded to the appointed memory position of the PC end by the router and applied, and the function of analyzing software application by XML hardware is realized.
The embodiment also provides a reconfigurable computing system under the multitasking condition, as shown in fig. 4, which comprises the XML parser based on the FPGA and a PC end in communication connection with the XML parser. Under the condition of multitasking, the method can be realized by opening up a plurality of analysis units and analysis ports in the FPGA, specifically, the program is required to be written into the FPGA board in advance, and the FPGA and the PC terminal are interconnected through a network interface. At this time, the XML application program at the PC end acquires a plurality of XML text files from the network and forwards the XML text files to the FPGA for processing. After the FPGA end analyzes, the memory containing the DOM tree is written back to the appointed memory position of the PC end and applied, so that the function of multi-task XML hardware analysis software application is realized.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (2)

1. The utility model provides an XML parser based on FPGA, includes a software and hardware system based on FPGA that is used for parsing the XML file, software and hardware system includes two parts of hardware part and software part, the hardware part is undertaken by the FPGA hardware, takes charge of the analytical task of XML file, the software part inserts normal application based on XML is parsed, its characterized in that:
the software and hardware system comprises:
the input/output module is used for downloading an XML file to be analyzed from a network and a memory head address of a block of idle continuous memory from a PC end from a transmission network, forwarding the XML file to be analyzed to the grammar analysis module, sealing the constructed DOM tree, and controlling the encapsulated DOM tree to be written into the continuous memory appointed by the PC end according to the memory head address of the continuous memory;
the grammar analysis module is used for performing lexical examination, element nesting relation examination and attribute name unique examination on the received XML file to be analyzed, and generating state information as a control signal of the DOM tree construction module;
the DOM tree construction module is used for analyzing the received XML file to be analyzed under the control of the control signal and constructing a DOM tree of the XML file to be analyzed according to the analysis result;
the DOM tree construction module comprises:
the controller is controlled by the control signal generated by the grammar analysis module to analyze the XML file to be analyzed into element nodes, attribute nodes and content nodes;
a content node memory controlled by the controller to automatically store content nodes;
an attribute node memory for automatically storing attribute nodes under the control of the controller;
an element node memory, controlled by the controller, for automatically storing element nodes;
the node stack is used for ensuring the correct construction of the DOM tree;
under the control of a control signal generated by the grammar analysis module, the controller automatically arranges the content node memory, the attribute node memory and element nodes in the element node memory according to a fixed format to form a DOM tree;
the input/output module includes:
the input buffer area is used for transmitting the downloaded XML file to be analyzed to the grammar analysis module in an AXI4stream form;
the output buffer area is used for packaging the built DOM tree and controlling the packaged DOM tree to be written into a continuous memory appointed by the PC end;
the grammar parsing module comprises:
the lexical examination module is used for extracting the marks of the XML text to be analyzed, and also comprises a state machine of the XML lexical, and the state machine is driven by the XML text data stream to generate a state which is used as a control signal of the DOM tree construction module;
the element nesting relation checking module is used for checking whether the opening and closing label of the XML file to be analyzed has a strict matching relation or not;
the attribute name uniqueness checking module is used for checking whether all attribute names of each element in the XML file to be analyzed are different;
and adopting a bloom filter to realize attribute name uniqueness check.
2. A reconfigurable computing system, comprising:
the at least one FPGA-based XML parser as claimed in claim 1, wherein the XML parser obtains the XML file to be parsed from the network, parses the XML file to be parsed, generates a DOM tree, and writes the DOM tree into a continuous memory specified by the PC side:
the PC end is in communication connection with the XML parser, and loads and applies data contained in the DOM tree from a continuous memory appointed by the PC end;
the reconfigurable computing system further comprises a router, the XML file to be analyzed in the network is routed to the XML analyzer through the router, and the DOM tree obtained through analysis by the XML analyzer is routed to the PC end through the router.
CN201811600605.9A 2018-12-26 2018-12-26 XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array) Active CN109753285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811600605.9A CN109753285B (en) 2018-12-26 2018-12-26 XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811600605.9A CN109753285B (en) 2018-12-26 2018-12-26 XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array)

Publications (2)

Publication Number Publication Date
CN109753285A CN109753285A (en) 2019-05-14
CN109753285B true CN109753285B (en) 2023-07-04

Family

ID=66404067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811600605.9A Active CN109753285B (en) 2018-12-26 2018-12-26 XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array)

Country Status (1)

Country Link
CN (1) CN109753285B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112764808A (en) * 2021-01-21 2021-05-07 西安羚控电子科技有限公司 Method for performing interface communication across systems, languages and hardware components

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646039A (en) * 2012-02-29 2012-08-22 河海大学 Software interface generating system and method based on extensible markup language (XML) Schema
CN104267998A (en) * 2014-10-13 2015-01-07 上海交通大学 Sliding window technology based hardware XML (Extensive Markup Language) parser

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411602B (en) * 2011-08-15 2013-07-24 浙江大学 Extensive makeup language (XML) parallel speculation analysis method realized on basis of field programmable gate array (FPGA)
CN103049439A (en) * 2011-10-11 2013-04-17 腾讯科技(深圳)有限公司 Processing method for markup language documents, browser and network operating system
US8812870B2 (en) * 2012-10-10 2014-08-19 Xerox Corporation Confidentiality preserving document analysis system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646039A (en) * 2012-02-29 2012-08-22 河海大学 Software interface generating system and method based on extensible markup language (XML) Schema
CN104267998A (en) * 2014-10-13 2015-01-07 上海交通大学 Sliding window technology based hardware XML (Extensive Markup Language) parser

Also Published As

Publication number Publication date
CN109753285A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
JP7326510B2 (en) Efficient state machines for real-time dataflow programming
US9298437B2 (en) Unrolling quantifications to control in-degree and/or out-degree of automaton
EP2668575B1 (en) Method and apparatus for compiling regular expressions
US9652312B2 (en) Realtime processing of streaming data
US20130195117A1 (en) Parameter acquisition method and device for general protocol parsing and general protocol parsing method and device
JP2003518291A (en) Method and apparatus for data exchange using a runtime code generator and translator
CN102411602B (en) Extensive makeup language (XML) parallel speculation analysis method realized on basis of field programmable gate array (FPGA)
CN109753285B (en) XML (extensive markup language) parser and reconfigurable computing system based on FPGA (field programmable Gate array)
Ritter et al. Hardware accelerated application integration processing: Industry paper
CN107800552A (en) A kind of data interactive method and device
US8312429B2 (en) Cell based data processing
CN114257560A (en) KNI-based switch network data caching implementation method
Patetta et al. A lightweight southbound interface for standalone P4-NetFPGA SmartNICs
US20140298303A1 (en) Method of processing program and program
Mironov et al. Stream documents processing invariance in situation-oriented databases
Murphy PARSING THE QUIC PACKET DESCRIPTION LANGUAGE
Ritter et al. Industry Paper: Hardware Accelerated Application Integration Processing
CN113746848A (en) Parallel programmable group packaging device and method
Han Research on cgi in embedded system
CN117135137A (en) Message data self-processing switch chip system and message forwarding method
Scherzinger Scalable query processing on XML streams
Burda et al. A tool framework for generation of application optimized communication protocols
Suddul et al. An Effective Approach to Parse SOAP Messages on Mobile Clients
Rusnák Efektivní mechanismy XML komunikace
US20040103403A1 (en) Embedded programming language to facilitate programming of an information packet processing unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant