US20140089332A1 - Efficient conversion of xml data into a model using persistent stores and parallelism - Google Patents

Efficient conversion of xml data into a model using persistent stores and parallelism Download PDF

Info

Publication number
US20140089332A1
US20140089332A1 US13/629,212 US201213629212A US2014089332A1 US 20140089332 A1 US20140089332 A1 US 20140089332A1 US 201213629212 A US201213629212 A US 201213629212A US 2014089332 A1 US2014089332 A1 US 2014089332A1
Authority
US
United States
Prior art keywords
xml
parsing
xml document
objects
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/629,212
Other versions
US9235650B2 (en
Inventor
Sujit Maharana
Douglas Scott Jackson
Subodh Chaubal
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens Industry Software Inc
Original Assignee
Siemens Product Lifecycle Management Software Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Product Lifecycle Management Software Inc filed Critical Siemens Product Lifecycle Management Software Inc
Priority to US13/629,212 priority Critical patent/US9235650B2/en
Assigned to SIEMENS PRODUCT LIFECYCLE MANAGEMENT SOFTWARE INC. reassignment SIEMENS PRODUCT LIFECYCLE MANAGEMENT SOFTWARE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAUBAL, SUBODH, JACKSON, DOUGLAS SCOTT, MAHARANA, SUJIT
Publication of US20140089332A1 publication Critical patent/US20140089332A1/en
Application granted granted Critical
Publication of US9235650B2 publication Critical patent/US9235650B2/en
Assigned to SIEMENS INDUSTRY SOFTWARE INC. reassignment SIEMENS INDUSTRY SOFTWARE INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SIEMENS PRODUCT LIFECYCLE MANAGEMENT SOFTWARE INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database

Definitions

  • the present disclosure is directed, in general, to computer-aided design, visualization, and manufacturing systems, product lifecycle management (“PLM”) systems, and similar systems, that manage data for products and other items (collectively, “Product Data Management” systems or “PDM” systems).
  • PLM product lifecycle management
  • PDM Process Data Management
  • PDM systems manage PLM and other data. Improved systems are desirable.
  • a method includes receiving an XML document by a parallel parser process, the XML document including a plurality of elements of an XML data structure that corresponds to an object model.
  • the method includes dividing the XML document into a plurality of chunks using the parallel parser process, and parsing the plurality of chunks in parallel using separate parsing tasks to produce objects representing the elements and corresponding attributes.
  • the method includes storing the objects and corresponding attributes in a persistent element store.
  • FIG. 1 depicts a block diagram of a data processing system in which an embodiment can be implemented
  • FIGS. 3 and 4 depict examples of a flow diagrams in accordance with disclosed embodiments.
  • FIG. 5 depicts a flowchart of a process in accordance with disclosed embodiments.
  • FIGS. 1 through 5 discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged device. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.
  • Object model data can be stored and processed as Extensible Markup Language (XML) data.
  • XML Extensible Markup Language
  • PDM systems that use such XML data must transfer, store, and process very large data structures used to describe object models, which can strain processing, storage, and communications resources.
  • Disclosed embodiments provide systems and methods for efficient conversion of XML data into a model using persistent stores and parallelism.
  • Disclosed systems and methods provide for faster mapping from an XML document into an object model for display in a graphical user interface.
  • Disclosed embodiments provide more efficient processing in any systems that exchange XML documents and display the contents of the documents in a graphical interface.
  • An XML “document” refers to any document, file, or other object that comprises XML data.
  • FIG. 1 depicts a block diagram of a data processing system in which an embodiment can be implemented, for example, as a PDM system particularly configured by software or otherwise to perform the processes as described herein, and in particular, as each one of a plurality of interconnected and communicating systems as described herein.
  • the data processing system depicted includes a processor 102 connected to a level two cache/bridge 104 , which is connected in turn to a local system bus 106 .
  • Local system bus 106 may be, for example, a peripheral component interconnect (PCI) architecture bus.
  • PCI peripheral component interconnect
  • main memory 108 main memory
  • graphics adapter 110 may be connected to display 111 .
  • the processor 102 can represent multiple processors, and each processor may have multiple processing cores, each of which can process a thread independently.
  • LAN local area network
  • WiFi Wireless Fidelity
  • Expansion bus interface 114 connects local system bus 106 to input/output (I/O) bus 116 .
  • I/O bus 116 is connected to keyboard/mouse adapter 118 , disk controller 120 , and I/O adapter 122 .
  • Disk controller 120 can be connected to a storage 126 , which can be any suitable machine usable or machine readable storage medium, including but not limited to nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), magnetic tape storage, and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs), and other known optical, electrical, or magnetic storage devices.
  • Storage 126 in various embodiments, is a fast persistent storage.
  • audio adapter 124 Also connected to I/O bus 116 in the example shown is audio adapter 124 , to which speakers (not shown) may be connected for playing sounds.
  • Keyboard/mouse adapter 118 provides a connection for a pointing device (not shown), such as a mouse, trackball, trackpointer, etc.
  • FIG. 1 may vary for particular implementations.
  • other peripheral devices such as an optical disk drive and the like, also may be used in addition or in place of the hardware depicted.
  • the depicted example is provided for the purpose of explanation only and is not meant to imply architectural limitations with respect to the present disclosure.
  • a data processing system in accordance with an embodiment of the present disclosure includes an operating system employing a graphical user interface.
  • the operating system permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application.
  • a cursor in the graphical user interface may be manipulated by a user through the pointing device. The position of the cursor may be changed and/or an event, such as clicking a mouse button, generated to actuate a desired response.
  • One of various commercial operating systems such as a version of Microsoft WindowsTM, a product of Microsoft Corporation located in Redmond, Wash. may be employed if suitably modified.
  • the operating system is modified or created in accordance with the present disclosure as described.
  • LAN/WAN/Wireless adapter 112 can be connected to a network 130 (not a part of data processing system 100 ), which can be any public or private data processing system network or combination of networks, as known to those of skill in the art, including the Internet.
  • Data processing system 100 can communicate over network 130 with server system 140 , which is also not part of data processing system 100 , but can be implemented, for example, as a separate data processing system 100 .
  • object models such as 3D models of simple or complex assemblies
  • XML data structures generally in a “tree” structure of parent nodes, child nodes, and leafs.
  • the structure is traversed and each element of the structure (or any of those that correspond to a portion being transferred) is transferred in an XML input to the receiving system or process.
  • the receiving system or process can then construct the object model from the XML input, and can store the constructed object model.
  • FIG. 2 shows an example of a process flow.
  • XML reader 210 reads the XML document and moves each element to an element store 220 .
  • Element store 220 may be maintained in memory, consuming a significant amount of resources.
  • Model builder 230 reads from the element store 220 to build the object model, and then stores the object model in model object store 240 .
  • Such processes can consume a large amount of memory and processing power in the process of receiving and translating the XML streams and creating the object model.
  • the entire transferred structure must be stored in memory and processed as a whole to account for interdependencies in the data. While the particular examples below relate specifically to PDM object model data, those of skill in the art will recognize that these techniques can be applied in other systems that transfer and process large amounts of XML data.
  • Disclosed embodiments improve user response times and reduce the amount of memory needed for loading XML documents by exploiting parallelism and moving XML data that is not currently referenced by the application to a persistent store. Breaking the XML input into a set of XML input streams and processing each of the input streams separately in different processes, and preferably by different processors, takes advantage of computer hardware containing multiple processors.
  • Each processor of the computer can process part of the XML data at the same time as other processors are processing other parts, so that multiple XML data portions are processed in parallel.
  • the XML fragments frequently reference other fragments, so this interdependency must be resolved.
  • the system can maintain a lookup table to find the referenced fragments. Further, rather than maintaining the entire XML and lookup table in memory, the system stores this information in a database or similar persistent storage.
  • the system can retrieve the XML fragments from the persistent storage rather than traversing the XML or using an in-memory lookup table.
  • An event-based XML parser can be used so that, rather than building a large representation of the XML document in memory, the parser fires events when parsing of XML elements is started and completed allowing the application to process fragments of the XML as they are parsed.
  • the fragments can be immediately stored into the persistent storage and need not be stored in memory.
  • the XML fragments can be persisted into a database using, for example, the Java® persistence architecture.
  • the XML fragments are identified by unique identifiers in the XML and those become the primary keys in the persistent store. When resolving references in the XML the unique identifiers are used to find the appropriate entry in the persistent store. Objects representing the elements and attributes of the XML are stored in this persistent element store and later retrieved in order to extract XML information necessary to build the model.
  • FIG. 3 depicts an example of a flow diagram in accordance with disclosed embodiments.
  • the XML data 302 is broken into a set of chunks by a parallel parser 304 , by dividing the XML file into equal parts based on the desired level of parallelism; in other embodiments, the chunk sizes are not necessarily of equal size.
  • parallel parser 304 can identify division points in the XML data; the parser searches for the nearest spot in the XML where it can be split and broken into a separate document. This can typically be the end of a tag that is directly under the root element.
  • Any event based parser such as the Streaming API for XML (STAX) or the Simple API for XML (SAX) can be used to parse the chunks once they are split apart.
  • STAX Streaming API for XML
  • SAX Simple API for XML
  • the Java Persistence Architecture provides a good mechanism for persisting the XML information.
  • the XML must contain a unique identifier that will allow references to the XML parts to be resolved. Once enough database entries have been created, processing of the entries can commence. This parsing works best for XML that is flatter (the ratio of direct children of the root element to elements that are not is high) this is due to the fact that the root element is replicated in each of the parts of the XML that are split apart.
  • the parallel parser 304 uses a chunking strategy 306 to determine the number of chunks, the size of each chunk, and the order in which they will be processed. Since the end of the n th chunk is the start of the (n+1) th chunk and the chunks are done in parallel, care must be taken so that the determination of the chunk boundaries is done serially. So, if the n th chunk is determining its end point, then the (n+1) th chunk needs to wait for it to complete and use its determined value+1 for its starting point. To avoid threads blocking during the determination of start and end points, it is best to process non-adjacent chunks first. The default chunking strategy does that. The default chunking strategy also takes into account the number of processors on the machine, and a minimum chunk size to prevent chunks which are too small from being created.
  • the parallel parser 304 also uses a parsing manager 308 which determines when parsing is complete and whether a particular tag is valid for starting the next chunk in the XML.
  • a valid tag is a tag which appears in the sequence of the complex content of the root tag.
  • the parsing manager 308 can indicate completion when the content was found or all chunks were parsed.
  • the parsing manager 308 can indicate completion when all chunks are parsed.
  • One or more parsing tasks 310 are implemented by the threads that parse the XML, and they interact with the parsing manager 308 for managing completion.
  • a parsing task factory interface can be used for instantiating parsing tasks 310 as needed by the parallel parser 304 . If the parsing task 310 is thread safe, a single task could be used for many threads—the manager can manage the creation and reuse of the parsing tasks 310 . As described herein, it is often preferable to maintain each parsing task 310 on a different processor 102 of the system or on different processor cores.
  • Element store 312 is preferably a persistent storage.
  • FIG. 4 depicts an example of a flow diagram in accordance with disclosed embodiments.
  • elements store 412 represents a storage into which the processed XML chunks were stored, such as element store 312 .
  • the model object builder 414 uses the element store 412 created when parsing the XML to access the XML elements and corresponding attributes and creates the minimal starting structure in model object store 418 before starting the rest of the model building background tasks, illustrated here as one or more modeling tasks 416 .
  • Modeling tasks 416 can act in parallel, under the control of model object builder 414 , to retrieve data from element store 412 , build the model objects, and store them in the model object store 418 .
  • the model objects stored in model object store 418 together represent one or more models.
  • the model stored in model object store 418 can be built using software such as the Java® Persistence Architecture software, backed up with a database on the file system so that the parts of the model not currently in use can be removed from memory as needed.
  • the hierarchical portions of the model track whether their children are populated and the model traversal can be performed exclusively with the use of a visitor which knows whether to wait for the children to be populated or not. In this fashion, access to the incomplete model can be provided while the model is populated in other threads.
  • the top levels of the structure can be displayed in a GUI while the lower levels of the structure are being populated in the background. If the user navigates to a part of the structure that is not yet constructed, the system can block access while waiting for the structure to be constructed.
  • the visitor follows a standard visitor design pattern, except for the traversal logic which can be centralized to a single implementation. The visitor also avoids the use of recursion so that except for the current object being visited, the rest of the structure is not referenced on the execution stack.
  • FIG. 5 depicts a flowchart of a process in accordance with disclosed embodiments. The process can be performed, for example, by a PDM data processing system including one or more data processing systems 100 .
  • the system can read an XML document, or other data, corresponding to at least one object model (step 505 ). This can be performed by an XML reader process.
  • the system receives the XML data by a parallel parser process (step 510 ).
  • the parallel parser process can receive the XML data from the XML reader process. In other embodiments, the parallel parser process can receive the XML data from another device or process, or otherwise.
  • the system divides the XML data into a plurality of chunks (or “streams”) using the parallel parser process (step 515 ).
  • the chunks can be of equal size, and the division can be performed based on division points in the XML data identified by the parallel parser process.
  • This step can be performed by the parallel parser using a chunking strategy to determine the number of chunks, size of each chunk, and the order in which the chunks are processed, as described in more detail above.
  • the system parses each of the chunks, including parsing a plurality of chunks in parallel using separate, and preferably independent, parsing tasks to produce objects representing the XML elements, and corresponding attributes (step 520 ).
  • each separate parsing task operates in a single parsing thread in a different processor or processor core.
  • a single parsing task can process multiple parsing threads.
  • each parsing task will receive chunks and produce Java® Objects that are stored the step below.
  • This step can be performed using a parsing manager, as described in more detail above, that that can manage the completion of parsing tasks, instantiate (and kill and reuse) parsing tasks as needed, and can perform other tasks.
  • a parsing manager as described in more detail above, that can manage the completion of parsing tasks, instantiate (and kill and reuse) parsing tasks as needed, and can perform other tasks.
  • the system stores objects representing the XML elements and corresponding attributes in an element store (step 525 ).
  • the elements store is preferably a persistent storage, and so avoids the problems involved with storing these in dynamic/RAM memory, and the elements are directly stored in the persistent storage.
  • the system selectively retrieves the XML elements and corresponding attributes from the element store (step 530 ).
  • the selection can be all of the elements corresponding to a model object, just those elements that correspond to a user-selected portion or subassembly of a model object, the elements that correspond to a query, or otherwise.
  • the system creates one or more model objects from the retrieved XML elements and corresponding attributes using a plurality of modeling tasks operating in parallel (step 535 ).
  • This step can include controlling the modeling tasks using a model object build process, and can include creating an initial structure in a model object store before starting the modeling tasks.
  • the system stores the model object in the model object store (step 540 ).
  • the model object store can also be a persistent store.
  • machine usable/readable or computer usable/readable mediums include: nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs).
  • ROMs read only memories
  • EEPROMs electrically programmable read only memories
  • user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs).

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Methods for product data management and corresponding systems and computer-readable mediums. A method includes receiving an XML document by a parallel parser process, the XML document including a plurality of elements of an XML data structure that corresponds to an object model. The method includes dividing the XML document into a plurality of chunks using the parallel parser process, and parsing the plurality of chunks in parallel using separate parsing tasks to produce objects representing the elements and corresponding attributes. The method includes storing the objects and corresponding attributes in a persistent element store.

Description

    TECHNICAL FIELD
  • The present disclosure is directed, in general, to computer-aided design, visualization, and manufacturing systems, product lifecycle management (“PLM”) systems, and similar systems, that manage data for products and other items (collectively, “Product Data Management” systems or “PDM” systems).
  • BACKGROUND OF THE DISCLOSURE
  • PDM systems manage PLM and other data. Improved systems are desirable.
  • SUMMARY OF THE DISCLOSURE
  • Various disclosed embodiments include methods for product data management, corresponding systems, and computer-readable mediums. A method includes receiving an XML document by a parallel parser process, the XML document including a plurality of elements of an XML data structure that corresponds to an object model. The method includes dividing the XML document into a plurality of chunks using the parallel parser process, and parsing the plurality of chunks in parallel using separate parsing tasks to produce objects representing the elements and corresponding attributes. The method includes storing the objects and corresponding attributes in a persistent element store.
  • The foregoing has outlined rather broadly the features and technical advantages of the present disclosure so that those skilled in the art may better understand the detailed description that follows. Additional features and advantages of the disclosure will be described hereinafter that form the subject of the claims. Those skilled in the art will appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Those skilled in the art will also realize that such equivalent constructions do not depart from the spirit and scope of the disclosure in its broadest form.
  • Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words or phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, whether such a device is implemented in hardware, firmware, software or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, and those of ordinary skill in the art will understand that such definitions apply in many, if not most, instances to prior as well as future uses of such defined words and phrases. While some terms may include a wide variety of embodiments, the appended claims may expressly limit these terms to specific embodiments.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, wherein like numbers designate like objects, and in which:
  • FIG. 1 depicts a block diagram of a data processing system in which an embodiment can be implemented;
  • FIGS. 3 and 4 depict examples of a flow diagrams in accordance with disclosed embodiments; and
  • FIG. 5 depicts a flowchart of a process in accordance with disclosed embodiments.
  • DETAILED DESCRIPTION
  • FIGS. 1 through 5, discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged device. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.
  • Object model data can be stored and processed as Extensible Markup Language (XML) data. PDM systems that use such XML data must transfer, store, and process very large data structures used to describe object models, which can strain processing, storage, and communications resources. Disclosed embodiments provide systems and methods for efficient conversion of XML data into a model using persistent stores and parallelism. Disclosed systems and methods provide for faster mapping from an XML document into an object model for display in a graphical user interface. Disclosed embodiments provide more efficient processing in any systems that exchange XML documents and display the contents of the documents in a graphical interface. An XML “document” refers to any document, file, or other object that comprises XML data.
  • FIG. 1 depicts a block diagram of a data processing system in which an embodiment can be implemented, for example, as a PDM system particularly configured by software or otherwise to perform the processes as described herein, and in particular, as each one of a plurality of interconnected and communicating systems as described herein. The data processing system depicted includes a processor 102 connected to a level two cache/bridge 104, which is connected in turn to a local system bus 106. Local system bus 106 may be, for example, a peripheral component interconnect (PCI) architecture bus. Also connected to local system bus in the depicted example are a main memory 108 and a graphics adapter 110. The graphics adapter 110 may be connected to display 111. In interest of clarity, only one block is used to represent processor 102, but in various embodiments the processor 102 can represent multiple processors, and each processor may have multiple processing cores, each of which can process a thread independently.
  • Other peripherals, such as local area network (LAN)/Wide Area Network/Wireless (e.g. WiFi) adapter 112, may also be connected to local system bus 106. Expansion bus interface 114 connects local system bus 106 to input/output (I/O) bus 116. I/O bus 116 is connected to keyboard/mouse adapter 118, disk controller 120, and I/O adapter 122. Disk controller 120 can be connected to a storage 126, which can be any suitable machine usable or machine readable storage medium, including but not limited to nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), magnetic tape storage, and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs), and other known optical, electrical, or magnetic storage devices. Storage 126, in various embodiments, is a fast persistent storage.
  • Also connected to I/O bus 116 in the example shown is audio adapter 124, to which speakers (not shown) may be connected for playing sounds. Keyboard/mouse adapter 118 provides a connection for a pointing device (not shown), such as a mouse, trackball, trackpointer, etc.
  • Those of ordinary skill in the art will appreciate that the hardware depicted in FIG. 1 may vary for particular implementations. For example, other peripheral devices, such as an optical disk drive and the like, also may be used in addition or in place of the hardware depicted. The depicted example is provided for the purpose of explanation only and is not meant to imply architectural limitations with respect to the present disclosure.
  • A data processing system in accordance with an embodiment of the present disclosure includes an operating system employing a graphical user interface. The operating system permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application. A cursor in the graphical user interface may be manipulated by a user through the pointing device. The position of the cursor may be changed and/or an event, such as clicking a mouse button, generated to actuate a desired response.
  • One of various commercial operating systems, such as a version of Microsoft Windows™, a product of Microsoft Corporation located in Redmond, Wash. may be employed if suitably modified. The operating system is modified or created in accordance with the present disclosure as described.
  • LAN/WAN/Wireless adapter 112 can be connected to a network 130 (not a part of data processing system 100), which can be any public or private data processing system network or combination of networks, as known to those of skill in the art, including the Internet. Data processing system 100 can communicate over network 130 with server system 140, which is also not part of data processing system 100, but can be implemented, for example, as a separate data processing system 100.
  • In a PDM system, object models, such as 3D models of simple or complex assemblies, can be described using XML data and stored in large XML data structures, generally in a “tree” structure of parent nodes, child nodes, and leafs. When these structures are accessed or transferred between systems, the structure is traversed and each element of the structure (or any of those that correspond to a portion being transferred) is transferred in an XML input to the receiving system or process. The receiving system or process can then construct the object model from the XML input, and can store the constructed object model.
  • FIG. 2 shows an example of a process flow. XML reader 210 reads the XML document and moves each element to an element store 220. Element store 220 may be maintained in memory, consuming a significant amount of resources. Model builder 230 reads from the element store 220 to build the object model, and then stores the object model in model object store 240.
  • Such processes can consume a large amount of memory and processing power in the process of receiving and translating the XML streams and creating the object model. In other systems, the entire transferred structure must be stored in memory and processed as a whole to account for interdependencies in the data. While the particular examples below relate specifically to PDM object model data, those of skill in the art will recognize that these techniques can be applied in other systems that transfer and process large amounts of XML data.
  • Disclosed embodiments improve user response times and reduce the amount of memory needed for loading XML documents by exploiting parallelism and moving XML data that is not currently referenced by the application to a persistent store. Breaking the XML input into a set of XML input streams and processing each of the input streams separately in different processes, and preferably by different processors, takes advantage of computer hardware containing multiple processors.
  • Each processor of the computer can process part of the XML data at the same time as other processors are processing other parts, so that multiple XML data portions are processed in parallel. Within the XML document, the XML fragments frequently reference other fragments, so this interdependency must be resolved. In order to traverse those relationships in an efficient manner, the system can maintain a lookup table to find the referenced fragments. Further, rather than maintaining the entire XML and lookup table in memory, the system stores this information in a database or similar persistent storage.
  • When processing the XML and traversing relationships between XML fragments, the system can retrieve the XML fragments from the persistent storage rather than traversing the XML or using an in-memory lookup table.
  • An event-based XML parser can be used so that, rather than building a large representation of the XML document in memory, the parser fires events when parsing of XML elements is started and completed allowing the application to process fragments of the XML as they are parsed. The fragments can be immediately stored into the persistent storage and need not be stored in memory.
  • The XML fragments can be persisted into a database using, for example, the Java® persistence architecture. The XML fragments are identified by unique identifiers in the XML and those become the primary keys in the persistent store. When resolving references in the XML the unique identifiers are used to find the appropriate entry in the persistent store. Objects representing the elements and attributes of the XML are stored in this persistent element store and later retrieved in order to extract XML information necessary to build the model.
  • FIG. 3 depicts an example of a flow diagram in accordance with disclosed embodiments.
  • In this example, the XML data 302 is broken into a set of chunks by a parallel parser 304, by dividing the XML file into equal parts based on the desired level of parallelism; in other embodiments, the chunk sizes are not necessarily of equal size. As part of this process, parallel parser 304 can identify division points in the XML data; the parser searches for the nearest spot in the XML where it can be split and broken into a separate document. This can typically be the end of a tag that is directly under the root element.
  • Any event based parser such as the Streaming API for XML (STAX) or the Simple API for XML (SAX) can be used to parse the chunks once they are split apart. As the STAX or SAX events occur, the system creates persistent storage entries to represent the information contained in the XML. The Java Persistence Architecture provides a good mechanism for persisting the XML information. The XML must contain a unique identifier that will allow references to the XML parts to be resolved. Once enough database entries have been created, processing of the entries can commence. This parsing works best for XML that is flatter (the ratio of direct children of the root element to elements that are not is high) this is due to the fact that the root element is replicated in each of the parts of the XML that are split apart.
  • The parallel parser 304 uses a chunking strategy 306 to determine the number of chunks, the size of each chunk, and the order in which they will be processed. Since the end of the nth chunk is the start of the (n+1)th chunk and the chunks are done in parallel, care must be taken so that the determination of the chunk boundaries is done serially. So, if the nth chunk is determining its end point, then the (n+1)th chunk needs to wait for it to complete and use its determined value+1 for its starting point. To avoid threads blocking during the determination of start and end points, it is best to process non-adjacent chunks first. The default chunking strategy does that. The default chunking strategy also takes into account the number of processors on the machine, and a minimum chunk size to prevent chunks which are too small from being created.
  • The parallel parser 304 also uses a parsing manager 308 which determines when parsing is complete and whether a particular tag is valid for starting the next chunk in the XML. In disclosed embodiments, a valid tag is a tag which appears in the sequence of the complex content of the root tag. For applications where the XML is being searched for a particular tag or content, the parsing manager 308 can indicate completion when the content was found or all chunks were parsed. For applications where the XML is being processed in entirety, then the parsing manager 308 can indicate completion when all chunks are parsed.
  • One or more parsing tasks 310 are implemented by the threads that parse the XML, and they interact with the parsing manager 308 for managing completion. A parsing task factory interface can be used for instantiating parsing tasks 310 as needed by the parallel parser 304. If the parsing task 310 is thread safe, a single task could be used for many threads—the manager can manage the creation and reuse of the parsing tasks 310. As described herein, it is often preferable to maintain each parsing task 310 on a different processor 102 of the system or on different processor cores.
  • As the parsing tasks 310 process each XML chunk, they store the processed XML in element store 312 as XML elements and corresponding attributes. Element store 312 is preferably a persistent storage.
  • FIG. 4 depicts an example of a flow diagram in accordance with disclosed embodiments. In this example, elements store 412 represents a storage into which the processed XML chunks were stored, such as element store 312.
  • The model object builder 414 uses the element store 412 created when parsing the XML to access the XML elements and corresponding attributes and creates the minimal starting structure in model object store 418 before starting the rest of the model building background tasks, illustrated here as one or more modeling tasks 416. Modeling tasks 416 can act in parallel, under the control of model object builder 414, to retrieve data from element store 412, build the model objects, and store them in the model object store 418. The model objects stored in model object store 418 together represent one or more models.
  • The model stored in model object store 418 can be built using software such as the Java® Persistence Architecture software, backed up with a database on the file system so that the parts of the model not currently in use can be removed from memory as needed. The hierarchical portions of the model track whether their children are populated and the model traversal can be performed exclusively with the use of a visitor which knows whether to wait for the children to be populated or not. In this fashion, access to the incomplete model can be provided while the model is populated in other threads.
  • For example, the top levels of the structure can be displayed in a GUI while the lower levels of the structure are being populated in the background. If the user navigates to a part of the structure that is not yet constructed, the system can block access while waiting for the structure to be constructed. The visitor follows a standard visitor design pattern, except for the traversal logic which can be centralized to a single implementation. The visitor also avoids the use of recursion so that except for the current object being visited, the rest of the structure is not referenced on the execution stack.
  • FIG. 5 depicts a flowchart of a process in accordance with disclosed embodiments. The process can be performed, for example, by a PDM data processing system including one or more data processing systems 100.
  • The system can read an XML document, or other data, corresponding to at least one object model (step 505). This can be performed by an XML reader process.
  • The system receives the XML data by a parallel parser process (step 510). In embodiments where the system is performing step 505, the parallel parser process can receive the XML data from the XML reader process. In other embodiments, the parallel parser process can receive the XML data from another device or process, or otherwise.
  • The system divides the XML data into a plurality of chunks (or “streams”) using the parallel parser process (step 515). The chunks can be of equal size, and the division can be performed based on division points in the XML data identified by the parallel parser process.
  • This step can be performed by the parallel parser using a chunking strategy to determine the number of chunks, size of each chunk, and the order in which the chunks are processed, as described in more detail above.
  • The system parses each of the chunks, including parsing a plurality of chunks in parallel using separate, and preferably independent, parsing tasks to produce objects representing the XML elements, and corresponding attributes (step 520). In some embodiments, each separate parsing task operates in a single parsing thread in a different processor or processor core. In other embodiments, a single parsing task can process multiple parsing threads. In some embodiments, where Java® techniques are used, each parsing task will receive chunks and produce Java® Objects that are stored the step below.
  • This step can be performed using a parsing manager, as described in more detail above, that that can manage the completion of parsing tasks, instantiate (and kill and reuse) parsing tasks as needed, and can perform other tasks.
  • The system stores objects representing the XML elements and corresponding attributes in an element store (step 525). The elements store is preferably a persistent storage, and so avoids the problems involved with storing these in dynamic/RAM memory, and the elements are directly stored in the persistent storage.
  • The system selectively retrieves the XML elements and corresponding attributes from the element store (step 530). The selection can be all of the elements corresponding to a model object, just those elements that correspond to a user-selected portion or subassembly of a model object, the elements that correspond to a query, or otherwise.
  • The system creates one or more model objects from the retrieved XML elements and corresponding attributes using a plurality of modeling tasks operating in parallel (step 535). This step can include controlling the modeling tasks using a model object build process, and can include creating an initial structure in a model object store before starting the modeling tasks.
  • The system stores the model object in the model object store (step 540). The model object store can also be a persistent store.
  • Of course, those of skill in the art will recognize that, unless specifically indicated or required by the sequence of operations, certain steps in the processes described above may be omitted, performed concurrently or sequentially, or performed in a different order. Any of the other features and processes described above can be included in the process of FIG. 5.
  • Those skilled in the art will recognize that, for simplicity and clarity, the full structure and operation of all data processing systems suitable for use with the present disclosure is not being depicted or described herein. Instead, only so much of a data processing system as is unique to the present disclosure or necessary for an understanding of the present disclosure is depicted and described. The remainder of the construction and operation of data processing system 100 may conform to any of the various current implementations and practices known in the art.
  • It is important to note that while the disclosure includes a description in the context of a fully functional system, those skilled in the art will appreciate that at least portions of the mechanism of the present disclosure are capable of being distributed in the form of instructions contained within a machine-usable, computer-usable, or computer-readable medium in any of a variety of forms, and that the present disclosure applies equally regardless of the particular type of instruction or signal bearing medium or storage medium utilized to actually carry out the distribution. Examples of machine usable/readable or computer usable/readable mediums include: nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs).
  • Although an exemplary embodiment of the present disclosure has been described in detail, those skilled in the art will understand that various changes, substitutions, variations, and improvements disclosed herein may be made without departing from the spirit and scope of the disclosure in its broadest form.
  • None of the description in the present application should be read as implying that any particular element, step, or function is an essential element which must be included in the claim scope: the scope of patented subject matter is defined only by the allowed claims. Moreover, none of these claims are intended to invoke paragraph six of 35 USC §112 unless the exact words “means for” are followed by a participle.

Claims (20)

What is claimed is:
1. A method for product data management, the method performed by at least one data processing system and comprising:
receiving an XML document by a parallel parser process, the XML document including a plurality of elements of an XML data structure that corresponds to an object model;
dividing the XML document into a plurality of chunks using the parallel parser process;
parsing the plurality of chunks in parallel using separate parsing tasks to produce objects representing the elements and corresponding attributes; and
storing the objects and corresponding attributes in a persistent element store.
2. The method of claim 1, wherein the data processing system also selectively retrieves the objects and corresponding attributes from the element store;
creates one or more model objects from the retrieved objects and corresponding attributes using a plurality of modeling tasks operating in parallel; and
stores the one or more model objects in a model object store.
3. The method of claim 2, wherein the data processing system also controls the modeling tasks using a model object build process, and creates an initial structure in the model object store before starting the modeling tasks.
4. The method of claim 1, wherein the XML document is received from an XML reader process that reads the XML document and passes it to a parallel parser process to produce the plurality of chunks as XML streams.
5. The method of claim 1, wherein the XML document is divided into the plurality of chunks based on division points in the XML document identified by the parallel parser process.
6. The method of claim 1, wherein each parsing task operates in a separate parsing thread in a different processor core.
7. The method of claim 1, wherein the system uses a parsing manager that manages the completion of parsing tasks and instantiates parsing tasks as needed.
8. A data processing system comprising:
a processor; and
an accessible memory, the data processing system particularly configured to
receive an XML document by a parallel parser process, the XML document including a plurality of elements of an XML data structure that corresponds to an object model;
divide the XML document into a plurality of chunks using the parallel parser process;
parse the plurality of chunks in parallel using separate parsing tasks to produce objects representing the elements and corresponding attributes; and
store the objects and corresponding attributes in a persistent element store.
9. The data processing system of claim 8, wherein the data processing system also selectively retrieves the objects and corresponding attributes from the element store;
creates one or more model objects from the retrieved objects and corresponding attributes using a plurality of modeling tasks operating in parallel; and
stores the one or more model objects in a model object store.
10. The data processing system of claim 9, wherein the data processing system also controls the modeling tasks using a model object build process, and creates an initial structure in the model object store before starting the modeling tasks.
11. The data processing system of claim 8, wherein the XML document is received from an XML reader process that reads the XML document and passes it to a parallel parser process to produce the plurality of chunks as XML streams.
12. The data processing system of claim 8, wherein the XML document is divided into the plurality of chunks based on division points in the XML document identified by the parallel parser process.
13. The data processing system of claim 8, wherein each parsing task operates in a separate parsing thread in a different processor core.
14. The data processing system of claim 8, wherein the system uses a parsing manager that manages the completion of parsing tasks and instantiates parsing tasks as needed.
15. A non-transitory computer-readable medium encoded with executable instructions that, when executed, cause one or more data processing systems to:
receive an XML document by a parallel parser process, the XML document including a plurality of elements of an XML data structure that corresponds to an object model;
divide the XML document into a plurality of chunks using the parallel parser process;
parse the plurality of chunks in parallel using separate parsing tasks to produce objects representing the elements and corresponding attributes; and
store the objects and corresponding attributes in a persistent element store.
16. The computer-readable medium of claim 15, wherein the data processing system also
selectively retrieves the objects and corresponding attributes from the element store;
creates one or more model objects from the retrieved objects and corresponding attributes using a plurality of modeling tasks operating in parallel; and
stores the one or more model objects in a model object store.
17. The computer-readable medium of claim 16, wherein the data processing system also controls the modeling tasks using a model object build process, and creates an initial structure in the model object store before starting the modeling tasks.
18. The computer-readable medium of claim 15, wherein the XML document is received from an XML reader process that reads the XML document and passes it to a parallel parser process to produce the plurality of chunks as XML streams.
19. The computer-readable medium of claim 15, wherein the XML document is divided into the plurality of chunks based on division points in the XML document identified by the parallel parser process.
20. The computer-readable medium of claim 15, wherein each parsing task operates in a separate parsing thread in a different processor core and the system uses a parsing manager that manages the completion of parsing tasks and instantiates parsing tasks as needed.
US13/629,212 2012-09-27 2012-09-27 Efficient conversion of XML data into a model using persistent stores and parallelism Active 2033-01-25 US9235650B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/629,212 US9235650B2 (en) 2012-09-27 2012-09-27 Efficient conversion of XML data into a model using persistent stores and parallelism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/629,212 US9235650B2 (en) 2012-09-27 2012-09-27 Efficient conversion of XML data into a model using persistent stores and parallelism

Publications (2)

Publication Number Publication Date
US20140089332A1 true US20140089332A1 (en) 2014-03-27
US9235650B2 US9235650B2 (en) 2016-01-12

Family

ID=50339941

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/629,212 Active 2033-01-25 US9235650B2 (en) 2012-09-27 2012-09-27 Efficient conversion of XML data into a model using persistent stores and parallelism

Country Status (1)

Country Link
US (1) US9235650B2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150363414A1 (en) * 2014-06-11 2015-12-17 International Business Machines Corporation Processing large xml files by splitting and hierarchical ordering
US20160224338A1 (en) * 2014-03-11 2016-08-04 Blue Wolf Group, LLC Analyzing Components Related To A Software Application In A Software Development Environment
WO2017116341A2 (en) 2015-12-31 2017-07-06 Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi A system for parallel processing and data modelling
WO2017194071A1 (en) * 2016-05-12 2017-11-16 Hewlett-Packard Development Company, L.P. Processing of 3d printing files
US10268672B2 (en) 2015-03-30 2019-04-23 International Business Machines Corporation Parallel parsing of markup language data
US11416526B2 (en) * 2020-05-22 2022-08-16 Sap Se Editing and presenting structured data documents

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160275219A1 (en) * 2015-03-20 2016-09-22 Siemens Product Lifecycle Management Software Inc. Simulating an industrial system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299811A1 (en) * 2006-06-21 2007-12-27 Sivansankaran Chandrasekar Parallel population of an XML index
US20090089658A1 (en) * 2007-09-27 2009-04-02 The Research Foundation, State University Of New York Parallel approach to xml parsing
US20110107052A1 (en) * 2009-10-30 2011-05-05 Senthilkumar Narayanasamy Virtual Disk Mapping
US8838626B2 (en) * 2009-12-17 2014-09-16 Intel Corporation Event-level parallel methods and apparatus for XML parsing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7860879B2 (en) * 2004-07-09 2010-12-28 Microsoft Corporation SMO scripting optimization
US7403940B2 (en) 2004-08-31 2008-07-22 Yahoo! Inc. Optimal storage and retrieval of XML data
US20080033974A1 (en) 2006-08-07 2008-02-07 International Characters, Inc. Method and Apparatus for XML Parsing Using Parallel Bit streams
US7975001B1 (en) * 2007-02-14 2011-07-05 The Mathworks, Inc. Bi-directional communication in a parallel processing environment
CN101329665A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method for analyzing marking language document and analyzer
US8321533B2 (en) * 2009-08-03 2012-11-27 Limelight Networks, Inc. Systems and methods thereto for acceleration of web pages access using next page optimization, caching and pre-fetching techniques

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299811A1 (en) * 2006-06-21 2007-12-27 Sivansankaran Chandrasekar Parallel population of an XML index
US20090089658A1 (en) * 2007-09-27 2009-04-02 The Research Foundation, State University Of New York Parallel approach to xml parsing
US20110107052A1 (en) * 2009-10-30 2011-05-05 Senthilkumar Narayanasamy Virtual Disk Mapping
US8838626B2 (en) * 2009-12-17 2014-09-16 Intel Corporation Event-level parallel methods and apparatus for XML parsing

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160224338A1 (en) * 2014-03-11 2016-08-04 Blue Wolf Group, LLC Analyzing Components Related To A Software Application In A Software Development Environment
US10782961B2 (en) * 2014-03-11 2020-09-22 International Business Machines Corporation Analyzing components related to a software application in a software development environment
US10031746B2 (en) * 2014-03-11 2018-07-24 International Business Machines Corporation Analyzing components related to a software application in a software development environment
US20180300125A1 (en) * 2014-03-11 2018-10-18 International Business Machines Corporation Analyzing components related to a software application in a software development environment
US10127329B2 (en) 2014-06-11 2018-11-13 International Business Machines Corporation Processing large XML files by splitting and hierarchical ordering
US9588975B2 (en) * 2014-06-11 2017-03-07 International Business Machines Corporation Processing large XML files by splitting and hierarchical ordering
US20150363414A1 (en) * 2014-06-11 2015-12-17 International Business Machines Corporation Processing large xml files by splitting and hierarchical ordering
US10387563B2 (en) 2015-03-30 2019-08-20 International Business Machines Corporation Parallel parsing of markup language data
US10268672B2 (en) 2015-03-30 2019-04-23 International Business Machines Corporation Parallel parsing of markup language data
WO2017116341A2 (en) 2015-12-31 2017-07-06 Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi A system for parallel processing and data modelling
WO2017194071A1 (en) * 2016-05-12 2017-11-16 Hewlett-Packard Development Company, L.P. Processing of 3d printing files
US11113467B2 (en) 2016-05-12 2021-09-07 Hewlett-Packard Development Company, L.P. Processing of 3D printing files
US11416526B2 (en) * 2020-05-22 2022-08-16 Sap Se Editing and presenting structured data documents

Also Published As

Publication number Publication date
US9235650B2 (en) 2016-01-12

Similar Documents

Publication Publication Date Title
US9235650B2 (en) Efficient conversion of XML data into a model using persistent stores and parallelism
US7912826B2 (en) Apparatus, computer program product, and method for supporting construction of ontologies
US8694557B2 (en) Extensibility of metaobjects
US8326813B2 (en) System and method for data management
US9552400B2 (en) Defining and mapping application interface semantics
US8332420B2 (en) System and method for performing a database query
JP7358698B2 (en) Training method, apparatus, device and storage medium for word meaning representation model
US9043750B2 (en) Automated generation of two-tier mobile applications
US8732127B1 (en) Method and system for managing versioned structured documents in a database
US20160314394A1 (en) Method and device for constructing event knowledge base
JP2017504874A (en) Design and implementation of clustered in-memory database
EP2874077A2 (en) Stateless database cache
US20140019889A1 (en) Regenerating a user interface area
US20220391589A1 (en) Systems and methods for training and evaluating machine learning models using generalized vocabulary tokens for document processing
US20110179090A1 (en) Product Lifecycle Management Using a Sparsely Populated Table
US10891114B2 (en) Interpreter for interpreting a data model algorithm and creating a data schema
US8768654B2 (en) Interactive configuration-management-based diagramming tool
US20190311041A1 (en) Database migration sequencing using dynamic object-relationship diagram
US8719268B2 (en) Utilizing metadata generated during XML creation to enable parallel XML processing
US9122740B2 (en) Bulk traversal of large data structures
US9280361B2 (en) Methods and systems for a real time transformation of declarative model and layout into interactive, digital, multi device forms
US10169478B2 (en) Optimize data exchange for MVC-based web applications
US9542502B2 (en) System and method for XML subdocument selection
KR20150060174A (en) Method of Automatic generation of source for business process automation
US8898122B1 (en) Method and system for managing versioned structured documents in a database

Legal Events

Date Code Title Description
AS Assignment

Owner name: SIEMENS PRODUCT LIFECYCLE MANAGEMENT SOFTWARE INC.

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MAHARANA, SUJIT;JACKSON, DOUGLAS SCOTT;CHAUBAL, SUBODH;REEL/FRAME:030027/0633

Effective date: 20120925

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

AS Assignment

Owner name: SIEMENS INDUSTRY SOFTWARE INC., TEXAS

Free format text: CHANGE OF NAME;ASSIGNOR:SIEMENS PRODUCT LIFECYCLE MANAGEMENT SOFTWARE INC.;REEL/FRAME:051171/0024

Effective date: 20191021

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8