WO2021140000A1 - System and method for retrieving compatible data packages to originating data - Google Patents

System and method for retrieving compatible data packages to originating data Download PDF

Info

Publication number
WO2021140000A1
WO2021140000A1 PCT/EP2020/086829 EP2020086829W WO2021140000A1 WO 2021140000 A1 WO2021140000 A1 WO 2021140000A1 EP 2020086829 W EP2020086829 W EP 2020086829W WO 2021140000 A1 WO2021140000 A1 WO 2021140000A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
component
approach
data packages
profiles
Prior art date
Application number
PCT/EP2020/086829
Other languages
French (fr)
Inventor
Aaron TAUDT
Philip CREUTZMANN
Lucas VON REUSS
Axel STELLBRINK
Original Assignee
Quant Ip Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quant Ip Gmbh filed Critical Quant Ip Gmbh
Publication of WO2021140000A1 publication Critical patent/WO2021140000A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Definitions

  • the present invention relates to a system and a method for retrieving compatible data packages to originating data.
  • Examples can be data, data packages, routines, sub-routines that shall be automatically selected for certain tasks of technical nature, such as finding a fast and reliable routine or most compatible routine for estimating a value for controlling a machine, traffic components, sorting products or documents, etc.
  • actual parts can be retrieved that can be used in order to be taken as spare parts that can be considered to constitute the most compatible ones.
  • the data packages can comprise data for one or more additive manufacturing steps in order to build or manufacture a part or spare part.
  • data for those parts such as dimensions, values of strength, means for attachment etc. should be present in the form of data packages.
  • For an originating part that is known in form of originating data the most compatible spare-part can be retrieved that are known in form of data packages.
  • the technical teaching such as in patent documentation can be retrieved that is most compatible for an originating task defined in form of originating data.
  • US 20170232515 A1 is directed to an additive manufacturing method that can use a two- dimensional energy patterning system is disclosed.
  • Information related to a part is provided, with the information including CAD files, material type, selected additive manufacturing process type, and tolerances of selected design features.
  • Manufacture of a part is simulated and compared to selected design tolerance. If the simulated manufactured part is outside selected design tolerances, simulation parameters can be adjusted until results indicate the simulated manufactured part is within selected design tolerances.
  • manufacturing the part uses a real-time sensor monitoring system, along with post processing analysis of selected design features to improve simulated manufacture of the part.
  • US 9849019 B2 is directed to a method of manufacturing a plurality of orthopedic implant components is provided.
  • the method includes providing a design for a first implant component having at least one dimension that is based on patient-specific information and a design for a second implant component having at least one dimension that is based on patient specific information.
  • a build plan is created that includes a position and orientation for at least each of the first and second implant components within a build chamber of a solid freeform fabrication machine and with respect to a platform of the build chamber.
  • the implants included in the build plan are produced by executing a build run of the solid freeform fabrication machine based on the build plan, wherein the plurality of implant components are positioned and oriented in an interleaved build configuration according to the build plan.
  • US 7945348 B2 is dedicated to software for controlling processes for a semiconductor manufacturing environment and may include a wafer-centric database, a real-time scheduler using a neural network, and a graphical user interface displaying simulated operation of the system.
  • These features may be employed alone or in combination to offer improved usability and computational efficiency for real time control and monitoring of a semiconductor manufacturing process. More generally, these techniques may be usefully employed in a variety of real time control systems, particularly systems requiring complex scheduling decisions or heterogeneous systems constructed of hardware from numerous independent vendors.
  • the data packages are preferably technically oriented data packages.
  • the present invention is directed to a method and a system for retrieving compatible data packages to originating data.
  • data can comprise any kind of data, such as optical data, numerical data, content data, information etc.
  • Originating data is intended to mean the data that constitutes the starting point for the retrieving.
  • Compatible data is intended to mean data that is or can be relevant to the originating data from the technical standpoint.
  • the system according to the present invention can comprise a database, itself comprising a plurality of data packages. It can even comprise massive amounts of data packages, such as millions or even more than a hundred million data packages.
  • the data packages can comprise metadata and content, the latter comprising any optical information or other information specifying one or more technical devices, apparatuses, systems, assemblies, methods etc. and any combination thereof.
  • the system according to the present invention can further comprise an inputting component configured to input the originating data.
  • the inputting component can be any kind of node or computing device that enables the entry of the originating data either directly or indirectly.
  • the system can also comprise an analyzing component configured to analyze the originating data.
  • the analyzing component can be a computing device that can be separate from the inputting component. While the inputting component can be affiliated with a user the analyzing component can be remotely located and can host the software that is configured to analyze the originating data or any other data that is entered later or during a different session.
  • the analysis can be based on at least one first approach and configured to generate a first approach first profile of the originating data.
  • the first approach can be a model that is capable or able to analyze the originating data, particularly its metadata and/or content data, preferably the content data.
  • the first approach can be a structured multidimensional representation of a technical task, device, system, assembly, method etc.
  • a spare part or device can be defined in a specification that is then analyzed and conveyed in a multidimensional representation, such as a multidimensional vector that can have more than 3, even hundreds, thousands or more dimensions in order to specify the spare part or device in the best manner.
  • the system can further comprise a retrieving component to automatically approach the database and to retrieve data packages from the database according to the first approach first profile and to provide a plurality of first data packages.
  • the first profile is intended to mean the first result of the first analysis of the originating data according to the first approach or model.
  • a similarity search can be initiated in the database.
  • a search can be conducted for a representation identical or as similar as possible to the one described above.
  • the retrieving component can even set a number of most similar representing data packages that are then delivered, e.g. to the inputting component.
  • the similarity search the different technical features or components are determined and being searched for in order to find the most compatible data package that, e.g., can 3D- print the most compatible spare product.
  • the analyzing component can further be configured to automatically analyze the preferred data packages according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component.
  • the selecting component can repeatedly select data packages and the analyzing component is weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component and is providing an actual retrieving profile combining the weighted profiles and classifiers. This can be based on the analysis of the most recently selected data packages and their weights.
  • the data packages can comprise a technical teaching, e.g., to construct or 3D-print a spare part.
  • the data packages can comprise a technical teaching in an enabling manner for a person skilled in the art, e.g., to reconstruct or 3D-print a device.
  • the first approach can comprise a first model and the second approach can comprise a second model wherein the second model is different to the first model.
  • the first approach profile can comprise a numerical representation of the analysis of the originating data and/or the selected data package(s) according to the first model.
  • the second approach profile can comprise a numerical representation of the analysis of the originating data and/or the selected data package(s) according to the second model.
  • the first approach can comprise a content analytics model and the first approach profile can comprise a first approach multidimensional representation.
  • the second approach can comprise a past distribution statistics model and the second approach profile can comprise a second approach multidimensional representation.
  • the first and second approach multidimensional representations are vectorized.
  • the analyzing component and/or the retrieving component can be configured to retrieve data packages from the database on the basis on either one, two or all of the three models.
  • the analyzing component and/or retrieving component can be configured to weigh or gradually combine the three models on the basis of the last result of the selection component.
  • the system can also comprise a selecting component for selecting one or more preferred data package(s) among the first data packages.
  • the selection can be content based or based on metadata or both to get a more holistic view and can constitute a view onto the data packages from a different perspective. This can be done automatically. This can further be triggered by the inputting component.
  • One example can be a frame or photo that is retrieved to be compatible or similar to an originating picture and then details of the frame or photo are analyzed in order to decide whether or not the frame or photo is relevant. The photo could be histological photo and the more detailed analysis then compares certain pixel of interest.
  • the data package can also be a technical document disclosing and/or showing a technical or scientific subject.
  • the decision can also be found with the help of a different entity that is better able to analyze the data package.
  • a possible out-selection or the provision of a certain value with respect to compatibility particularly of a number of data packages a further elaborated analysis is performed and potential first training data is created.
  • the analyzing component can be further configured to automatically analyze the preferred data package(s) according to the first approach and at least a second approach and to generate a first approach second profile and a second approach first profile. Similar to before, the first and second approaches can be different models and the respective profiles are then the results of the analyses of the data packages selected according to the different models.
  • the second approach can be statistical model that is representing past similarities between technical tasks, parts or devices.
  • a reference to those fields can be advantageous. This can be based on large scale past analysis, can be encoded in order to find related fields also with statistical probabilities.
  • the analyzing component can be activated or re-activated by a soft key for refreshing the analysis, just as after the selecting component (5) has been activated.
  • the system for retrieving compatible data packages to originating data in accordance with the present invention can alternatively or further comprise a database comprising a plurality of data packages, an inputting component configured to input the originating data, and an analytics database comprising a plurality of profiles relevant for retrieving compatible data packages to originating data.
  • the analytics database can be different to the other database and can be also located at a different site or on a different server or server storage.
  • the profiles can be stored that are elaborated at the beginning or during the retrieving process.
  • the profiles can be the specific results of the analysis of specific data packages in accordance with the models mentioned.
  • the profiles can comprise a plurality of features.
  • Features are intended to mean certain attributes or criteria or parameters or specifiers of a data package that are often independent from each other.
  • An example could be the function of a technical component, the affiliation or cooperation with other components etc.
  • the data packages each can comprise at least one of datasets, callable units, metadata and content data and any combination thereof.
  • the inputting component can be configured to input or grade data packages automatically or semi-automatically. There can be a number of grades available, such as 4 grades in total, one representing non-relevance, one a neutral grade, another one a grade for relevant and a further one for very relevant.
  • All the features and method steps according to the present invention can be automatic and/or at least semi-automatic.
  • the system can further comprise a machine learning component that can be a learning component that is configured to automatically provide classifiers on the basis of the features of the profiles on the basis of the above-mentioned approaches.
  • Those classifiers can be technical distinctions by different kinds of data packages etc.
  • the present invention particularly refers to a combination of the before described systems.
  • the database comprises more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages. Thus, the database can be large. This shows also the difficulty or challenge to retrieve compatible data packages to originating data.
  • the machine learning component can be configured to establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and to determine relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data.
  • Any retrieving of compatible data packages can make or initiate the analyzing component to analyze the originating data or any subsequent originating data on the basis of the first approach, the second approach and the classifiers. That is, the present data in the inputting component that can be the originating data or any remaining data packages from a previous run can be re-analyzed by the first and second approach in order to improve the classifiers of the machine learning step.
  • the inputting component of the present invention can be activated upon entry of an individual password.
  • the inputting component can repeatedly initiate the retrieving of compatible data packages any time upon the most recent corresponding profile in the analytics database.
  • the most recent corresponding profile is intended to mean the youngest of the profiles that is available or that has been stored.
  • Such profiles can also be shared between different inputting components in case they are authorized.
  • a plurality of inputting components is configured or can be configured upon entry of a code to share data in the analytics database assigned to the inputting components.
  • the system can be further configured to allow the repeating of retrieving compatible data by another inputting component than the previous one.
  • the system can be also configured to repeat retrieving compatible data based on the profiles in the analytics database.
  • a switch or softkey can be provided at or in the inputting component.
  • the originating data comprises metadata and content data.
  • the originating data comprises a code that can be part of the metadata.
  • the database comprises the other metadata and/or content data that corresponds to the code of the originating data.
  • the retrieving component and/or the database can allocate the respective data package and push or pull it to the inputting component.
  • the system can be configured to pull the other metadata and/or the content data from the database.
  • the data packages each can comprise metadata and content data. Accordingly, features of the profiles can then correspond to portions of the metadata of the data packages. Those features can comprise numbers, letters, frames pictures etc. and any combination thereof. A feature can comprise a name, a date, a location, one or more classification(s) of the technology etc. In case of a patent documents, the metadata is contained on the front page and can even comprise field numbers.
  • the inputting component can comprise a display and/or a keyboard.
  • the inputting component can be a computer, a terminal, a laptop, a smart handheld, a tablet, a smartphone etc.
  • the originating data can be provided by the inputting component in encoded fashion in a code, such as a combination of letters and numbers, and the analyzing component can be adapted to gather the complete originating data from the database based on that code.
  • the code can be an ID number, such as a document number, and the complete originating data comprises a document.
  • the complete originating data can comprise meta data and content data.
  • the metadata can comprise more than 5 fields, preferably more than 10 fields, even more preferably more than 50 fields.
  • the metadata can comprise less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields. In patent documents there are typically almost 60 fields with pre-defined content.
  • the data packages can represent patents, patent applications and/or utility models.
  • the analyzing component can comprise a multidimensional analysis (MDA) tool.
  • the first profile can be a vectorized profile and the search in the general architecture can be based on that vectorized profile. The same can apply to the second profile.
  • MDA multidimensional analysis
  • the analyzing component and/or the retrieving component can be configured to provide the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data entered or remained in the inputting component. This can happen with a set of data packages wherein the selection component or analyzing component can assist in de-selecting data packages of no or low compatibility and differentiating compatibility of other data packages. The ones with the highest or rather high compatibility will then remain and upon a refreshing or re-analyzing the first and second approach profiles will be refreshed and generally be improved (but not necessarily).
  • the inputting component can be configured to display a representative summary of the plurality of the data packages that can be sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data. Together with this a value or a graphical representation of the similarity etc. can be comprised in the data packages or in the part of the data packages that are displayed in the inputting component.
  • the inputting component can be configured to allow a de-selecting and/or weighing of the data packages upon one or more clicks by the inputting component in pre-defined fields.
  • the inputting component can be configured to display a representative summary of the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data and can provide the option to out-select data packages and/or to provide a quantitative value or graphical value (such as 1 to 3 stars) according to their relevance.
  • the quantitative value can be from 0 to 3 or from 0 to 4. Any other grading can also be provided.
  • the out selection or check for combability can be performed also on an automated basis. Just as one example: a routine can then be tested according to the values resulted and the time for the computing and other criteria that can be relevant for compatibility.
  • a representative summary of the plurality of data packages can comprise a plurality of metadata of the respective data packages.
  • the representative summary can be displayed by the inputting component.
  • the inputting component can be configured to allow an expansion of any parts of the data packages upon a click by the inputting component.
  • the de-selecting or the weighing of the compatibility of one or more or all of the data packages retrieved can be performed by clicks to certain fields for de-selecting and/or differentiating the weight or neighborhood or similarity of a data package.
  • the present invention can also involve a server comprising the analyzing component and the retrieving component.
  • the server can be remotely located and/or can be part of a cloud system.
  • the server can comprise the analytics database so that the history and/or the present status of the profiles can be stored in the server or close to it.
  • the analytics database can be also located remotely.
  • the server can comprise a bus constituting the communication interface.
  • the bus can further constitute the communication interface with the inputting device and the database.
  • the inputting component can comprise the selecting component, it can be an integral part of it.
  • the present invention is also directed to a method for retrieving compatible data packages to originating data.
  • the same or corresponding before specification and/or definitions with respect to the system apply with the respect to the method.
  • the method can comprise the step of storing a plurality of data packages in a database. Whenever the method is started of initiated there is a step of inputting the originating data or to already take an existing originating data that can be locally or remotely stored.
  • a profile usually comprises the characterization of the result of the analysis.
  • the method according to the present invention can be automatically approach the database and retrieve the data packages from the database according to the first approach first profile and providing a plurality of first data packages.
  • One or more preferred data package(s) among the first data packages (10-12) can be selected and then there is an automatically analyzing of the preferred data package(s) according to the first approach and at least a second approach and generating a first approach second profile and a second approach first profile.
  • the invention is also directed to a method for retrieving compatible data packages to originating data, comprising the steps of: storing a plurality of data packages in a database; inputting the originating data; storing a plurality of profiles relevant for retrieving compatible data packages to originating data in an analytics database; wherein the profiles comprise a plurality of features; and conducting supervised machine learning and providing classifiers on the basis of the features of the profiles.
  • the method can also be a combination of the before described methods.
  • the data packages can each comprise at least one of datasets, callable units, metadata and content data and any combination thereof.
  • the inputting of data is performed automatically or semi-automatically.
  • the inputting of data can be further supported by a graphical user interface.
  • the inputting of data can be performed by initiating the pulling of data from a data base by entering a code.
  • the code can be an identifier for the respective data package, such as an ID number.
  • the database can comprise more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages.
  • the machine learning approach can establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and determining relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data.
  • Any retrieving of compatible data packages can make or trigger the analyzing component to analyze the originating data or any subsequent originating data on the basis of the first approach, the second approach and the classifiers.
  • the data packages can each comprise at least one of datasets, callable units, metadata and content data and any combination thereof. There can be the further step of inputting originating data or grading data packages automatically or semi-automatically.
  • An inputting component can initiate the before and below described only upon entry of an individual password.
  • a step of billing can also be present so that the user has to pay for retrieving the data packages. This can be done on a subscription basis of for each individual retrieving of document.
  • the step of repeating the initiating of the retrieving of compatible data packages any time on the most recent corresponding profile(s) in the analytics database can be present as well.
  • a controlling of a plurality of inputting components can happen so that they are configured or can be configured upon entry of a code to share data in the analytics database assigned to the inputting components.
  • a repeating of the retrieving compatible data packages based on the profiles in the analytics database can be provided as well. This step of repeating retrieving compatible data can be initiated by another inputting component than the previous one.
  • the originating data can comprise metadata and content data.
  • the originating data can also or alternatively comprise a code that is part of the metadata wherein the database comprises the other metadata and/or content data that corresponds to the code of the originating data.
  • the step of pulling the other metadata and/or the content data from the database on the basis of the code can be realized.
  • the data packages each can comprise metadata and content data.
  • the features of the profiles can correspond to portions of the metadata of the data packages.
  • the originating data can be provided by the inputting component in encoded fashion in a code and the analyzing component can be adapted to gather the complete originating data from the database based on that code.
  • the code can be an ID number, such as a document number, and the complete originating data can comprises a document.
  • the complete originating data can comprise meta data and content data.
  • the metadata comprises more than 5 fields, preferably more than 10 fields, even more preferably more than 50 fields.
  • the metadata can comprise less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields.
  • the analyzing component can further comprise a multidimensional analysis (MDA).
  • MDA multidimensional analysis
  • the first profile and/or the second profile can be (a) vectorized profile(s) and the search in the general architecture is based on that vectorized profile(s).
  • the plurality of the data packages can be sorted according to their relevance and/or weight and/or neighborhood and/or their similarity to the originating data or most recent originating data.
  • the step of displaying a representative summary of the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data can be performed.
  • the quantitative value can be from 0 to 3 or from 0 to 4. There can be also a de-selecting value that is tagging a data package.
  • a representative summary of the plurality of data packages can comprise a plurality of metadata of the respective data packages.
  • the inputting component can be configured to allow an expansion of any parts of the data packages upon a click by the inputting component.
  • a further step of billing for the retrieving of data packages can be also present. That can be done on a case-by-case basis on a regular subscription approach.
  • a processor can be provided and may be singular or plural, and may be, but not limited to, a CPU, GPU, DSP, APU, or FPGA.
  • the memory 26 may be singular or plural, and may be, but not limited to, being volatile or non-volatile, such an SDRAM, DRAM, SRAM, Flash Memory, MRAM, F-RAM, or P-RAM.
  • a data processing device can comprise means of data processing, such as, processor units, hardware accelerators and/or microcontrollers.
  • the data processing device can comprise memory components, such as, main memory (e.g. RAM), cache memory (e.g. SRAM) and/or secondary memory (e.g. HDD, SDD).
  • the data processing device can comprise busses configured to facilitate data exchange between components of the data processing device, such as, the communication between the memory components and the processing components.
  • the data processing device 20 can comprise network interface cards that can be configured to connect the data processing device to a network, such as, to the Internet.
  • the data processing device can comprise user interfaces, such as:
  • output user interface such as: o screens or monitors configured to display visual data (e.g. displaying graphical user interfaces of the questionnaire to the user), o speakers configured to communicate audio data (e.g. playing audio data to the user),
  • input user interface such as: o camera configured to capture visual data (e.g. capturing images and/or videos of the user), o microphone configured to capture audio data (e.g. recording audio from the user), o keyboard configured to allow the insertion of text and/or other keyboard commands (e.g. allowing the user to enter text data and/or other keyboard commands by having the user type on the keyboard) and/or a trackpad, mouse, touchscreen, joystick - configured to facilitate the navigation through different graphical user interfaces of the questionnaire.
  • o camera configured to capture visual data (e.g. capturing images and/or videos of the user)
  • o microphone configured to capture audio data (e.g. recording audio from the user)
  • o keyboard configured to allow the insertion of text and/or other keyboard commands (e.g. allowing the user to enter text data and/or other keyboard commands by having the user type on the keyboard) and/or a trackpad, mouse, touchscreen, joystick - configured to facilitate the navigation through different graphical user interfaces of the questionnaire.
  • the data processing device can be a processing unit configured to carry out instructions of a program.
  • the data processing device can be a system-on-chip comprising processing units, memory components and busses.
  • the data processing device can be a personal computer, a laptop, a pocket computer, a smartphone, a tablet computer.
  • the data processing device can be a server.
  • the data processing device can be a processing unit or a system-on-chip that can be interfaced with a personal computer, a laptop, a pocket computer, a smartphone, a tablet computer and/or user interface (such as the upper- mentioned user interfaces).
  • the present invention also refers to a use of the system according to any of the preceding system or method embodiments for carrying out the method according to any of the preceding method embodiments.
  • the present invention also covers a computer related product for carrying out the method according to any of the preceding method embodiments.
  • the invention tries to improve the art by combining techniques and further boosting the outcome by a machine-learning approach.
  • System for retrieving compatible data packages to originating data (2) comprising: a. a database (4) comprising a plurality of data packages (10-18); b. an inputting component (1) configured to input the originating data (2); c. an analyzing component (3) configured to analyze the originating data (2) based on at least one first approach and to generate a first approach first profile of the originating data (2); d. a retrieving component (3) configured to automatically approach the database (4) and to retrieve data packages (10-18) from the database (4) according to the first approach first profile and to provide a plurality of first data packages (10-12); e. a selecting component (5) for selecting one or more preferred data package(s) (11,12) among the first data packages (10-12); and f. the analyzing component (3) being further configured to automatically analyze the preferred data package(s) (11,12) according to the first approach and at least a second approach and to generate a first approach second profile and a second approach first profile.
  • System for retrieving compatible data packages to originating data (2) comprising: a. a database (4) comprising a plurality of data packages (10-18); b. an inputting component (1) configured to input the originating data (2); c. an analytics database (6) comprising a plurality of profiles relevant for retrieving compatible data packages to originating data (2); d. wherein the profiles comprise a plurality of features; e. a supervised machine learning component (3) configured to provide classifiers on the basis of the features of the profiles.
  • the selecting component (5) is repeatedly selecting data packages (13-15) and the analyzing component (3) is weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component (3) on the basis of the profiles and classifiers and is providing an actual retrieving profile combining the weighted profiles and classifiers.
  • the first approach profile comprises a numerical representation of the analysis of the originating data (2) and/or the selected data package(s) (10-12) according to the first model.
  • the database comprises more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages.
  • machine learning component (3) is configured to establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and to determine relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data (2).
  • System further comprising a plurality of inputting components (1) that are configured or can be configured upon entry of a code to share data in the analytics database (6) assigned to the inputting components.
  • system according to any of the preceding system embodiments wherein the system is configured to repeat retrieving compatible data based on the profiles in the analytics database (6).
  • the originating data (2) comprises a code that is part of the metadata and wherein the database (4) comprises the other metadata and/or content data that corresponds to the code of the originating data (2).
  • System according to the preceding system embodiment wherein the system is configured to pull the other metadata and/or the content data from the database (4).
  • analyzing component (3) comprises a multidimensional analysis (MDA) tool.
  • MDA multidimensional analysis
  • the inputting component (1) is configured to display a representative summary of the plurality of the data packages (10-18) sorted according to their relevance and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2).
  • the inputting component (1) is configured to display a representative summary of the plurality of the data packages (5-7) sorted according to their relevance and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2) and provides the option to out-select data packages (10-18) and/or to provide a quantitative value to their relevance.
  • a representative summary of the plurality of data packages comprises a plurality of metadata of the respective data packages (10-18).
  • System further comprising a server (3) comprising the analyzing component (3a) and the retrieving component (3b).
  • System according to any of the preceding system embodiments further comprising a server (3,6) comprising the analytics database (6). 553. System according to the two preceding system embodiments wherein the server (3,6) comprises a bus (3d) constituting the communication interface.
  • bus is further constituting the communication interface with the inputting device and the database (4).
  • system comprises a billing component for billing for the retrieving of data packages.
  • Method for retrieving compatible data packages to originating data (2) comprising the steps of: a. storing a plurality of data packages (10-18) in a database (4); b. inputting the originating data (2); c. analyzing the originating data (2) based on at least one first approach and generating a first approach first profile of the originating data (2); d. automatically approaching the database (4) and retrieving data packages (10- 18) from the database (4) according to the first approach first profile and providing a plurality of first data packages (10-12); e. selecting one or more preferred data package(s) (11,12) among the first data packages (10-12); and f. automatically analyzing the preferred data package(s) (10-12) according to the first approach and at least a second approach and generating a first approach second profile and a second approach first profile.
  • Method for retrieving compatible data packages to originating data (2) comprising the steps of: g. storing a plurality of data packages (10-18) in a database (4); h. inputting the originating data (2); i. storing a plurality of profiles relevant for retrieving compatible data packages to originating data (2) in an analytics database (6); j. wherein the profiles comprise a plurality of features; k. conducting supervised machine learning and providing classifiers on the basis of the features of the profiles.
  • the first approach profile comprises a numerical representation of the analysis of the originating data (2) and/or the selected data package(s) (10-12) according to the first model.
  • GUI graphical user interface
  • Method according to any of the preceding method embodiments wherein the inputting of data is performed by initiating the pulling of data from a data base by entering a code.
  • Method according to any of the preceding method embodiments further comprising controlling a plurality of inputting components so that they are configured or can be configured upon entry of a code to share data in the analytics database (6) assigned to the inputting components.
  • M30 Method according to any of the preceding method embodiments with the step of repeating retrieving compatible data packages (10-18) based on the profiles in the analytics database (6).
  • M31 Method according to the two preceding method embodiments with the step of repeating retrieving compatible data by another inputting component (1) than the previous one.
  • M41 Method according to any of the preceding method embodiments S12 to S22 wherein the metadata comprises less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields.
  • M42 Method according to any of the preceding method embodiments wherein the analyzing comprises a multidimensional analysis (MDA).
  • MDA multidimensional analysis
  • M50 Method according to any of the preceding method embodiments wherein a representative summary of the plurality of data packages comprises a plurality of metadata of the respective data packages (10-18).
  • M51 Method according to any of the preceding method embodiments wherein the inputting component is configured to allow an expansion of any parts of the data packages (10-18) upon a click by the inputting component (1).
  • Fig. 1 schematically exemplifies an embodiment of a workflow between components according to the present invention
  • FIG. 2 schematically exemplifies a more detailed embodiment of a rather early workflow between components according to the present invention
  • FIG. 3 schematically exemplifies a more detailed embodiment of a later workflow than shown in Fig. 2 between components according to the present invention
  • Fig. 4 schematically exemplifies a workflow of components and their control by different controlling models according to the present invention
  • Fig. 5 schematically exemplifies an alternative workflow of components and their control by controlling models according to the present invention. Description of preferred embodiments as exemplified in the figures
  • Fig. 1 schematically depicts an embodiment of a method and components of a respective system configured for carrying out the method.
  • an inputting component or a node 1 that can comprise a graphic user interface (GUI).
  • GUI graphic user interface
  • the inputting component may be also controlled by a user or just displays to a user a progress.
  • Originating data 2 is either inputted into the inputting component 1 or pulled or pushed into the inputting component 1.
  • the inputting component can be a workstation, a computer and/or any kind of handheld computing devices, such as a laptop, tablet, smart phone etc.
  • the originating data 2 is then analyzed by an analyzing component 3 wherein the analysis is based on a first profile.
  • the analyzing component 3 can be remote to the inputting component 1 and can be accessible in the same country of interest or in a different place or can be provided by what is called the cloud.
  • the first profile can be an analysis of a task to be fulfilled by a software routine or sub-routine and/or a certain document.
  • the analysis is then allocating routines or sub-routines as data packages that are available in a library that are able to fulfill the task.
  • the content of the originating data may be analyzed according to different parts of its content. This could be done by a stepwise and then aggregated analytics process represented by a multi-modal vector.
  • the data packages can be routines or sub-routines that are or appear to be capable to fulfill a certain task.
  • the data packages can also be any other callable unit, such as a procedure, a function, a routine, a method, or a subprogram.
  • the data packages can also be aggregated data or datasets, representing a certain content of a document.
  • the analysis can be done locally or remotely. If it is done locally, the analyzing component can be configured to do so. Alternatively, it could send an order to or parts of an order to analyze to a remote server or to the cloud.
  • the data packages 10-12 should be compatible or matching to the originating data 2. This comprises a certain pre-defined or actualized similarity or neighborhood profile.
  • the steps can be tracked, protocolized or stored in a analytics database 6. This should enable to follow and reconstruct the sequence of steps, intermediate and final results. This can also serve in order to share the progress or to merge the method with findings of others. While a complete tracking takes place also an overwriting of rather old steps and findings can take place for data efficiency reasons.
  • Those data packages 10-12 are then again analyzed by a selection or choice-by- elimination.
  • the analysis is then based on another profile than the first profile.
  • the run-time and/or precision etc. of a routine This can be performed in or by a selecting component 5 coarsely or more finely granulated by a score card approach.
  • the respective result is then fed into the analyzing component 3 that will actualize the analysis according to the selection in or by the selecting component.
  • the small circle to the left next to the selecting component 5 means a dead end, i.e., data package 10 in that case will be sorted out or considered incompatible.
  • the other data packages 11 and 12 can then be classified automatically.
  • the classification can in any case be also on a marking system.
  • the marking system can be made simple and may categorize into 2 or 4 classes only.
  • the analyzing component 3 is further configured to automatically analyze the preferred data package(s) (11,12) according to the first profile and now also a second profile that is different to the first profile. It will be analyzed which criteria or parameters or any combination thereof is fulfilled or close to be fulfilled by the preferred data packages 11, 12.
  • the actualized first and second profiles, any core information therefrom or other metadata will be also stored in an analytics database 6.
  • the previous profile can be either over written or the new profiles can be stored additionally for documentation purposes.
  • Fig. 2 exemplifies an embodiment of the analyzing component 3. It can comprise a CPU or any other controller 3a that is in communication with a memory 3c.
  • the analyzing component can also comprise a retrieving component 3b that is configured to approach an external entity, such as the database 4.
  • the inputting device 1, the database 4 as well as the analytics database 4 can be accessible by a communication interface, such as a bus 3d. Both or one of the databases 4 and 6 can be located locally or remotely. In the example shown, the inputting device 1 can be located remotely from the analyzing component 3. The same holds true for the database 4 and the analytics database 6, wherein all or any two of those components can be located also remotely from each other.
  • Fig. 3 shows in addition to Fig. 2 that either in a more progressed state or in another embodiment the selecting component 1 is also affiliated ed to the inputting component 1 or can even be integrated in the same device as in a terminal, computer, laptop, handheld etc.
  • the analyzing component with all elements 3a-3d and the analytics database can be integrated or located non-remotely. All elements can be connected by one or more bus 3d. They can also be integrated in a server or cloud arrangement.
  • the server 3 can comprise the analyzing component 3a and the retrieving component 3b and the analytics database 6.
  • the server 3,6 comprises a bus 3d constituting the communication interface.
  • the inputting component 1 can also comprise the selecting component.
  • Fig. 4 exemplifies in more detail how and when the different approaches or models are involved.
  • the first profile is approached by the analytics component 3.
  • the analytics component 3 In the example shown it is one profile but it can already initially also be two or more.
  • the analyzing and/or retrieving component 3 then delivers the three data packages 10 to 12.
  • the selecting component 5 is supplied with information or the data or the metadata of the data packages 10 to 12.
  • the respective selection which then takes place namely the selection of data packages 11 and 12 according to the example shown in Fig. 1 are then delivered to the first profile 30 and a second profile 31.
  • the analyzing and/or retrieving component 3 is then coordinating the further analysis of the two remaining data packages 11 and 12 according to the first model 30 and the second model 31.
  • These profiles 30, 31 can work independent from each other but it can also happen that they coordinate their analytics. Anyhow, the analyzing component 3 then delivers the outcome of the either independent or combined analytics according to both models 30 and 31 and delivers data packages 13 to 15.
  • the first model 30 can be a highly complex and multimodal profile that is analyzing the content of data packages 11 and 12 and retrieving a pattern in this content. This pattern can then be represented as a multidimension value. Such values can have many dimensions, even more than hundred thousand or millions of dimensions, each representing an occurrence of a certain value or term.
  • Model 31 can be a statistical profile that for example is already fed with a statistical distribution of certain phenomena in data packages and tries to allocate an already known statistical pattern in documents 11 and 12 or the most similar one.
  • the documents 13 to 15 or their content is fed to selecting component 5 that is again then selecting relevant from nonrelevant data packages.
  • a package 10 was out selected and data packages 14 and 15 were found to be compatible.
  • the grade of compatibility may be also differentiated with a numerical value, let's say from 1 to 3, wherein 1 this the least compatible data package and 3 means the most compatible data package.
  • a numerical value for the non-relevant other package may be given, such as the value zero. In the example shown earlier, the data packages 10 and 13 have received the value zero.
  • the information (that may include the content and/or the metadata) is then classified with a machine learning approach or model.
  • machine learning model 32 a classification may take place on the basis of random decision trees.
  • the last stage will then deliver by a third profile and the analyzing component 3 three data packages 16 to 18 that in the embodiment shown should be the most relevant data packages of all.
  • the analyzing component 3 is further trained to train the analytics and searching model. This typically leads to a more sophisticated or differentiated model.
  • the analyzing and/or retrieving component 3 then again approaches the database 4.
  • the analyzing component 3 will then automatically search in the database 4 based on the first and second profiles to provide a plurality of second data packages 13-15.
  • the most compatible data packages will be selected or eliminated or differentially classified. They may be ranked as well according to their similarity or neighborhood value.
  • the retrieving component 3 again approaches the database and stores the profiles in the analytics database 6. This will then provide further data packages 16-18.
  • the analyzing and searching step of component 3 can also be configured to additionally approach another third profile that can be a further classifying tool. This can be the random forest tool.
  • This process can be performed continuously or can be stopped. It can also be automatically stop in case the data packages 16-18 fulfill a pre-defined or actualized quality criteria. In that case the profiles will be stored and will be available for further methods for the same or similar originating data.
  • Fig. 5 depicts another option. Most parts are the same or similar to Fig. 4. However, in the last stage, the three models as described above will be applied and combined. The grade of combination of each model is also subject of change by the retrieving component 3 and depends on the analysis of the selection that has taken place. In selecting component 5.
  • the term "at least one of a first option and a second option" is intended to mean the first option or the second option or the first option and the second option.
  • step (X) preceding step (Z) encompasses the situation that step (X) is performed directly before step (Z), but also the situation that (X) is performed before one or more steps (Yl), followed by step (Z).
  • step (X) is performed directly before step (Z)
  • step (Yl) is performed before one or more steps (Yl), followed by step (Z).

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is directed to retrieving compatible data packages to originating data, comprising the step of storing a plurality of data packages in a database, inputting the originating data, analyzing of the originating data based on at least one first approach and generating a first approach first profile of the originating data. The invention is also directed to a system and method for retrieving compatible data packages to originating data, comprising the steps of: storing a plurality of data packages in a database; inputting the originating data; storing a plurality of profiles relevant for retrieving compatible data packages to originating data in an analytics database; wherein the profiles comprise a plurality of features; and conducting supervised machine learning and providing classifiers on the basis of the features of the profiles.

Description

System and method for retrieving compatible data packages to originating data
Field
The present invention relates to a system and a method for retrieving compatible data packages to originating data.
Introduction
Due to today's large volumes of data it is difficult to find compatible data packages to originating data. This is particularly true when there are many or even very many of aspects that should be fulfilled by data packages or be addressed in by data packages. This holds true when data packages that are being searched for that are able to fulfill a certain complex task or that are addressing the complex task or a complex content.
Examples can be data, data packages, routines, sub-routines that shall be automatically selected for certain tasks of technical nature, such as finding a fast and reliable routine or most compatible routine for estimating a value for controlling a machine, traffic components, sorting products or documents, etc. Also, actual parts can be retrieved that can be used in order to be taken as spare parts that can be considered to constitute the most compatible ones. In the latter case the data packages can comprise data for one or more additive manufacturing steps in order to build or manufacture a part or spare part. In that case data for those parts, such as dimensions, values of strength, means for attachment etc. should be present in the form of data packages. For an originating part that is known in form of originating data the most compatible spare-part can be retrieved that are known in form of data packages. The same holds true for technical specifications in the form of datasheets, technical descriptions of products, systems, methods or any other form of specification of technology. Even the technical teaching such as in patent documentation can be retrieved that is most compatible for an originating task defined in form of originating data.
US 20170232515 A1 is directed to an additive manufacturing method that can use a two- dimensional energy patterning system is disclosed. Information related to a part is provided, with the information including CAD files, material type, selected additive manufacturing process type, and tolerances of selected design features. Manufacture of a part is simulated and compared to selected design tolerance. If the simulated manufactured part is outside selected design tolerances, simulation parameters can be adjusted until results indicate the simulated manufactured part is within selected design tolerances. In certain embodiments, manufacturing the part uses a real-time sensor monitoring system, along with post processing analysis of selected design features to improve simulated manufacture of the part.
US 9849019 B2 is directed to a method of manufacturing a plurality of orthopedic implant components is provided. The method includes providing a design for a first implant component having at least one dimension that is based on patient-specific information and a design for a second implant component having at least one dimension that is based on patient specific information. A build plan is created that includes a position and orientation for at least each of the first and second implant components within a build chamber of a solid freeform fabrication machine and with respect to a platform of the build chamber. The implants included in the build plan are produced by executing a build run of the solid freeform fabrication machine based on the build plan, wherein the plurality of implant components are positioned and oriented in an interleaved build configuration according to the build plan.
Another example may be found in US 7945348 B2 that is dedicated to software for controlling processes for a semiconductor manufacturing environment and may include a wafer-centric database, a real-time scheduler using a neural network, and a graphical user interface displaying simulated operation of the system. These features may be employed alone or in combination to offer improved usability and computational efficiency for real time control and monitoring of a semiconductor manufacturing process. More generally, these techniques may be usefully employed in a variety of real time control systems, particularly systems requiring complex scheduling decisions or heterogeneous systems constructed of hardware from numerous independent vendors.
The before described processes are often cumbersome and time consuming and result in an unsatisfying outcome.
Summary
In light ofthe above, it is an object of the present invention to overcome or at least alleviate the shortcomings of the prior art. More particularly, it is an object of the present invention to provide a method and a corresponding system to retrieve compatible data packages to originating data. The data packages are preferably technically oriented data packages.
These objects are fulfilled by the method and system of the present invention. The present invention is directed to a method and a system for retrieving compatible data packages to originating data. The term data can comprise any kind of data, such as optical data, numerical data, content data, information etc. Originating data is intended to mean the data that constitutes the starting point for the retrieving. Compatible data is intended to mean data that is or can be relevant to the originating data from the technical standpoint.
The system according to the present invention can comprise a database, itself comprising a plurality of data packages. It can even comprise massive amounts of data packages, such as millions or even more than a hundred million data packages. The data packages can comprise metadata and content, the latter comprising any optical information or other information specifying one or more technical devices, apparatuses, systems, assemblies, methods etc. and any combination thereof.
The system according to the present invention can further comprise an inputting component configured to input the originating data. The inputting component can be any kind of node or computing device that enables the entry of the originating data either directly or indirectly.
The system can also comprise an analyzing component configured to analyze the originating data. The analyzing component can be a computing device that can be separate from the inputting component. While the inputting component can be affiliated with a user the analyzing component can be remotely located and can host the software that is configured to analyze the originating data or any other data that is entered later or during a different session. The analysis can be based on at least one first approach and configured to generate a first approach first profile of the originating data. The first approach can be a model that is capable or able to analyze the originating data, particularly its metadata and/or content data, preferably the content data.
The first approach can be a structured multidimensional representation of a technical task, device, system, assembly, method etc. Just as an example, a spare part or device can be defined in a specification that is then analyzed and conveyed in a multidimensional representation, such as a multidimensional vector that can have more than 3, even hundreds, thousands or more dimensions in order to specify the spare part or device in the best manner.
In this first step also more than one approaches or models can be applied. In case more than one approach is provided, the second approach that is addressed before and below can be applied next to the first one. The result of the analysis would then be represented by a second approach first profile. The system can further comprise a retrieving component to automatically approach the database and to retrieve data packages from the database according to the first approach first profile and to provide a plurality of first data packages. The first profile is intended to mean the first result of the first analysis of the originating data according to the first approach or model.
Then, a similarity search can be initiated in the database. In the example provided before, a search can be conducted for a representation identical or as similar as possible to the one described above. The retrieving component can even set a number of most similar representing data packages that are then delivered, e.g. to the inputting component. In the similarity search the different technical features or components are determined and being searched for in order to find the most compatible data package that, e.g., can 3D- print the most compatible spare product.
The analyzing component can further be configured to automatically analyze the preferred data packages according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component.
The selecting component can repeatedly select data packages and the analyzing component is weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component and is providing an actual retrieving profile combining the weighted profiles and classifiers. This can be based on the analysis of the most recently selected data packages and their weights.
The data packages can comprise a technical teaching, e.g., to construct or 3D-print a spare part. The data packages can comprise a technical teaching in an enabling manner for a person skilled in the art, e.g., to reconstruct or 3D-print a device.
The first approach can comprise a first model and the second approach can comprise a second model wherein the second model is different to the first model. The first approach profile can comprise a numerical representation of the analysis of the originating data and/or the selected data package(s) according to the first model.
The second approach profile can comprise a numerical representation of the analysis of the originating data and/or the selected data package(s) according to the second model.
The first approach can comprise a content analytics model and the first approach profile can comprise a first approach multidimensional representation.
The second approach can comprise a past distribution statistics model and the second approach profile can comprise a second approach multidimensional representation.
The first and second approach multidimensional representations are vectorized. The analyzing component and/or the retrieving component can be configured to retrieve data packages from the database on the basis on either one, two or all of the three models.
The analyzing component and/or retrieving component can be configured to weigh or gradually combine the three models on the basis of the last result of the selection component.
Moreover, the system can also comprise a selecting component for selecting one or more preferred data package(s) among the first data packages. The selection can be content based or based on metadata or both to get a more holistic view and can constitute a view onto the data packages from a different perspective. This can be done automatically. This can further be triggered by the inputting component. One example can be a frame or photo that is retrieved to be compatible or similar to an originating picture and then details of the frame or photo are analyzed in order to decide whether or not the frame or photo is relevant. The photo could be histological photo and the more detailed analysis then compares certain pixel of interest. The data package can also be a technical document disclosing and/or showing a technical or scientific subject. The decision can also be found with the help of a different entity that is better able to analyze the data package. In a possible out-selection or the provision of a certain value with respect to compatibility particularly of a number of data packages a further elaborated analysis is performed and potential first training data is created.
The analyzing component can be further configured to automatically analyze the preferred data package(s) according to the first approach and at least a second approach and to generate a first approach second profile and a second approach first profile. Similar to before, the first and second approaches can be different models and the respective profiles are then the results of the analyses of the data packages selected according to the different models.
The second approach can be statistical model that is representing past similarities between technical tasks, parts or devices. As an example, in case a spare part for wings of a wind turbine has been also found sometimes in aeronautics and/or water gliders, a reference to those fields can be advantageous. This can be based on large scale past analysis, can be encoded in order to find related fields also with statistical probabilities.
The analyzing component can be activated or re-activated by a soft key for refreshing the analysis, just as after the selecting component (5) has been activated.
The system for retrieving compatible data packages to originating data in accordance with the present invention can alternatively or further comprise a database comprising a plurality of data packages, an inputting component configured to input the originating data, and an analytics database comprising a plurality of profiles relevant for retrieving compatible data packages to originating data. The analytics database can be different to the other database and can be also located at a different site or on a different server or server storage. In this analytics database the profiles can be stored that are elaborated at the beginning or during the retrieving process. The profiles can be the specific results of the analysis of specific data packages in accordance with the models mentioned.
The profiles can comprise a plurality of features. Features are intended to mean certain attributes or criteria or parameters or specifiers of a data package that are often independent from each other. An example could be the function of a technical component, the affiliation or cooperation with other components etc.
The data packages each can comprise at least one of datasets, callable units, metadata and content data and any combination thereof.
The inputting component can be configured to input or grade data packages automatically or semi-automatically. There can be a number of grades available, such as 4 grades in total, one representing non-relevance, one a neutral grade, another one a grade for relevant and a further one for very relevant.
All the features and method steps according to the present invention can be automatic and/or at least semi-automatic.
By this setup a sufficient number of training data or training samples is created at the same time. The system can further comprise a machine learning component that can be a learning component that is configured to automatically provide classifiers on the basis of the features of the profiles on the basis of the above-mentioned approaches. Those classifiers can be technical distinctions by different kinds of data packages etc.
The present invention particularly refers to a combination of the before described systems.
The database comprises more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages. Thus, the database can be large. This shows also the difficulty or challenge to retrieve compatible data packages to originating data.
According to the present invention, the machine learning component can be configured to establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and to determine relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data.
Any retrieving of compatible data packages can make or initiate the analyzing component to analyze the originating data or any subsequent originating data on the basis of the first approach, the second approach and the classifiers. That is, the present data in the inputting component that can be the originating data or any remaining data packages from a previous run can be re-analyzed by the first and second approach in order to improve the classifiers of the machine learning step.
The results are by far better than by other approaches. Test have revealed this.
The inputting component of the present invention can be activated upon entry of an individual password. This includes any ID, PIN, passcode or any combination thereof.
The inputting component can repeatedly initiate the retrieving of compatible data packages any time upon the most recent corresponding profile in the analytics database. The most recent corresponding profile is intended to mean the youngest of the profiles that is available or that has been stored.
Such profiles can also be shared between different inputting components in case they are authorized. In this case a plurality of inputting components is configured or can be configured upon entry of a code to share data in the analytics database assigned to the inputting components. The system can be further configured to allow the repeating of retrieving compatible data by another inputting component than the previous one.
The system can be also configured to repeat retrieving compatible data based on the profiles in the analytics database. For that purpose, a switch or softkey can be provided at or in the inputting component.
The originating data comprises metadata and content data. The originating data comprises a code that can be part of the metadata. The database comprises the other metadata and/or content data that corresponds to the code of the originating data. By means of inputting the code the inputting component, the retrieving component and/or the database can allocate the respective data package and push or pull it to the inputting component. Thus, the system can be configured to pull the other metadata and/or the content data from the database.
Also, the data packages each can comprise metadata and content data. Accordingly, features of the profiles can then correspond to portions of the metadata of the data packages. Those features can comprise numbers, letters, frames pictures etc. and any combination thereof. A feature can comprise a name, a date, a location, one or more classification(s) of the technology etc. In case of a patent documents, the metadata is contained on the front page and can even comprise field numbers.
The inputting component can comprise a display and/or a keyboard. The inputting component can be a computer, a terminal, a laptop, a smart handheld, a tablet, a smartphone etc. The originating data can be provided by the inputting component in encoded fashion in a code, such as a combination of letters and numbers, and the analyzing component can be adapted to gather the complete originating data from the database based on that code. The code can be an ID number, such as a document number, and the complete originating data comprises a document.
The complete originating data can comprise meta data and content data. In this case, the metadata can comprise more than 5 fields, preferably more than 10 fields, even more preferably more than 50 fields. The metadata can comprise less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields. In patent documents there are typically almost 60 fields with pre-defined content. The data packages can represent patents, patent applications and/or utility models.
The analyzing component can comprise a multidimensional analysis (MDA) tool. The first profile can be a vectorized profile and the search in the general architecture can be based on that vectorized profile. The same can apply to the second profile.
The analyzing component and/or the retrieving component can be configured to provide the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data entered or remained in the inputting component. This can happen with a set of data packages wherein the selection component or analyzing component can assist in de-selecting data packages of no or low compatibility and differentiating compatibility of other data packages. The ones with the highest or rather high compatibility will then remain and upon a refreshing or re-analyzing the first and second approach profiles will be refreshed and generally be improved (but not necessarily).
The inputting component can be configured to display a representative summary of the plurality of the data packages that can be sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data. Together with this a value or a graphical representation of the similarity etc. can be comprised in the data packages or in the part of the data packages that are displayed in the inputting component.
The inputting component can be configured to allow a de-selecting and/or weighing of the data packages upon one or more clicks by the inputting component in pre-defined fields.
The inputting component can be configured to display a representative summary of the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data and can provide the option to out-select data packages and/or to provide a quantitative value or graphical value (such as 1 to 3 stars) according to their relevance. The quantitative value can be from 0 to 3 or from 0 to 4. Any other grading can also be provided.
The out selection or check for combability can be performed also on an automated basis. Just as one example: a routine can then be tested according to the values resulted and the time for the computing and other criteria that can be relevant for compatibility.
A representative summary of the plurality of data packages can comprise a plurality of metadata of the respective data packages. The representative summary can be displayed by the inputting component.
The inputting component can be configured to allow an expansion of any parts of the data packages upon a click by the inputting component. Similarly, the de-selecting or the weighing of the compatibility of one or more or all of the data packages retrieved can be performed by clicks to certain fields for de-selecting and/or differentiating the weight or neighborhood or similarity of a data package.
The present invention can also involve a server comprising the analyzing component and the retrieving component. The server can be remotely located and/or can be part of a cloud system.
The server can comprise the analytics database so that the history and/or the present status of the profiles can be stored in the server or close to it. Alternatively, the analytics database can be also located remotely. The server can comprise a bus constituting the communication interface. The bus can further constitute the communication interface with the inputting device and the database.
The inputting component can comprise the selecting component, it can be an integral part of it.
The present invention is also directed to a method for retrieving compatible data packages to originating data. The same or corresponding before specification and/or definitions with respect to the system apply with the respect to the method.
The method can comprise the step of storing a plurality of data packages in a database. Whenever the method is started of initiated there is a step of inputting the originating data or to already take an existing originating data that can be locally or remotely stored.
Then follows an analyzing of the originating data based on at least one first approach and generating a first approach first profile of the originating data. As before and below, a profile usually comprises the characterization of the result of the analysis. The method according to the present invention can be automatically approach the database and retrieve the data packages from the database according to the first approach first profile and providing a plurality of first data packages.
One or more preferred data package(s) among the first data packages (10-12) can be selected and then there is an automatically analyzing of the preferred data package(s) according to the first approach and at least a second approach and generating a first approach second profile and a second approach first profile.
The invention is also directed to a method for retrieving compatible data packages to originating data, comprising the steps of: storing a plurality of data packages in a database; inputting the originating data; storing a plurality of profiles relevant for retrieving compatible data packages to originating data in an analytics database; wherein the profiles comprise a plurality of features; and conducting supervised machine learning and providing classifiers on the basis of the features of the profiles.
The method can also be a combination of the before described methods.
The data packages can each comprise at least one of datasets, callable units, metadata and content data and any combination thereof.
The inputting of data is performed automatically or semi-automatically. The inputting of data can be further supported by a graphical user interface. The inputting of data can be performed by initiating the pulling of data from a data base by entering a code. The code can be an identifier for the respective data package, such as an ID number.
The database can comprise more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages.
The machine learning approach can establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and determining relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data.
Any retrieving of compatible data packages can make or trigger the analyzing component to analyze the originating data or any subsequent originating data on the basis of the first approach, the second approach and the classifiers.
The data packages can each comprise at least one of datasets, callable units, metadata and content data and any combination thereof. There can be the further step of inputting originating data or grading data packages automatically or semi-automatically.
An inputting component can initiate the before and below described only upon entry of an individual password. A step of billing can also be present so that the user has to pay for retrieving the data packages. This can be done on a subscription basis of for each individual retrieving of document.
Additionally, the step of repeating the initiating of the retrieving of compatible data packages any time on the most recent corresponding profile(s) in the analytics database can be present as well.
Further, a controlling of a plurality of inputting components can happen so that they are configured or can be configured upon entry of a code to share data in the analytics database assigned to the inputting components.
A repeating of the retrieving compatible data packages based on the profiles in the analytics database can be provided as well. This step of repeating retrieving compatible data can be initiated by another inputting component than the previous one.
The originating data can comprise metadata and content data.
The originating data can also or alternatively comprise a code that is part of the metadata wherein the database comprises the other metadata and/or content data that corresponds to the code of the originating data.
Additionally or alternatively, the step of pulling the other metadata and/or the content data from the database on the basis of the code can be realized.
The data packages each can comprise metadata and content data. The features of the profiles can correspond to portions of the metadata of the data packages.
The originating data can be provided by the inputting component in encoded fashion in a code and the analyzing component can be adapted to gather the complete originating data from the database based on that code.
The code can be an ID number, such as a document number, and the complete originating data can comprises a document.
The complete originating data can comprise meta data and content data.
The metadata comprises more than 5 fields, preferably more than 10 fields, even more preferably more than 50 fields. The metadata can comprise less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields.
The analyzing component can further comprise a multidimensional analysis (MDA).
The first profile and/or the second profile can be (a) vectorized profile(s) and the search in the general architecture is based on that vectorized profile(s).
The plurality of the data packages can be sorted according to their relevance and/or weight and/or neighborhood and/or their similarity to the originating data or most recent originating data.
Additionally or alternatively, the step of displaying a representative summary of the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data can be performed.
There can be the further step of de-selecting and/or weighing the data packages upon one or more clicks by the inputting component in pre-defined fields.
Additionally or alternatively, there can be the further step of displaying a representative summary of the plurality of the data packages sorted according to their relevance and/or neighborhood and/or their similarity to the originating data or most recent originating data and providing the option to out select data packages and/or to provide a quantitative value to their relevance.
The quantitative value can be from 0 to 3 or from 0 to 4. There can be also a de-selecting value that is tagging a data package.
A representative summary of the plurality of data packages can comprise a plurality of metadata of the respective data packages.
The inputting component can be configured to allow an expansion of any parts of the data packages upon a click by the inputting component.
A further step of billing for the retrieving of data packages can be also present. That can be done on a case-by-case basis on a regular subscription approach.
A processor can be provided and may be singular or plural, and may be, but not limited to, a CPU, GPU, DSP, APU, or FPGA. The memory 26 may be singular or plural, and may be, but not limited to, being volatile or non-volatile, such an SDRAM, DRAM, SRAM, Flash Memory, MRAM, F-RAM, or P-RAM. A data processing device can comprise means of data processing, such as, processor units, hardware accelerators and/or microcontrollers. The data processing device can comprise memory components, such as, main memory (e.g. RAM), cache memory (e.g. SRAM) and/or secondary memory (e.g. HDD, SDD). The data processing device can comprise busses configured to facilitate data exchange between components of the data processing device, such as, the communication between the memory components and the processing components. The data processing device 20 can comprise network interface cards that can be configured to connect the data processing device to a network, such as, to the Internet. The data processing device can comprise user interfaces, such as:
• output user interface, such as: o screens or monitors configured to display visual data (e.g. displaying graphical user interfaces of the questionnaire to the user), o speakers configured to communicate audio data (e.g. playing audio data to the user),
• input user interface, such as: o camera configured to capture visual data (e.g. capturing images and/or videos of the user), o microphone configured to capture audio data (e.g. recording audio from the user), o keyboard configured to allow the insertion of text and/or other keyboard commands (e.g. allowing the user to enter text data and/or other keyboard commands by having the user type on the keyboard) and/or a trackpad, mouse, touchscreen, joystick - configured to facilitate the navigation through different graphical user interfaces of the questionnaire.
The data processing device can be a processing unit configured to carry out instructions of a program. The data processing device can be a system-on-chip comprising processing units, memory components and busses. The data processing device can be a personal computer, a laptop, a pocket computer, a smartphone, a tablet computer. The data processing device can be a server. The data processing device can be a processing unit or a system-on-chip that can be interfaced with a personal computer, a laptop, a pocket computer, a smartphone, a tablet computer and/or user interface (such as the upper- mentioned user interfaces).
The present invention also refers to a use of the system according to any of the preceding system or method embodiments for carrying out the method according to any of the preceding method embodiments. The present invention also covers a computer related product for carrying out the method according to any of the preceding method embodiments.
In general terms, the invention tries to improve the art by combining techniques and further boosting the outcome by a machine-learning approach.
The invention is further described with the following numbered embodiments.
Below, system embodiments will be discussed. These embodiments are abbreviated by the letter "S" followed by a number. Whenever reference is herein made to "system embodiments", these embodiments are meant.
51. System for retrieving compatible data packages to originating data (2), comprising: a. a database (4) comprising a plurality of data packages (10-18); b. an inputting component (1) configured to input the originating data (2); c. an analyzing component (3) configured to analyze the originating data (2) based on at least one first approach and to generate a first approach first profile of the originating data (2); d. a retrieving component (3) configured to automatically approach the database (4) and to retrieve data packages (10-18) from the database (4) according to the first approach first profile and to provide a plurality of first data packages (10-12); e. a selecting component (5) for selecting one or more preferred data package(s) (11,12) among the first data packages (10-12); and f. the analyzing component (3) being further configured to automatically analyze the preferred data package(s) (11,12) according to the first approach and at least a second approach and to generate a first approach second profile and a second approach first profile.
52. System for retrieving compatible data packages to originating data (2), comprising: a. a database (4) comprising a plurality of data packages (10-18); b. an inputting component (1) configured to input the originating data (2); c. an analytics database (6) comprising a plurality of profiles relevant for retrieving compatible data packages to originating data (2); d. wherein the profiles comprise a plurality of features; e. a supervised machine learning component (3) configured to provide classifiers on the basis of the features of the profiles.
S3. System according to all preceding system embodiments. 54. System according to the preceding system embodiment wherein the analyzing component (3) is further configured to automatically analyze the preferred data packages (10-18) according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component (3).
55. System according to the preceding system embodiment wherein the selecting component (5) is repeatedly selecting data packages (13-15) and the analyzing component (3) is weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component (3) on the basis of the profiles and classifiers and is providing an actual retrieving profile combining the weighted profiles and classifiers.
56. System according to any one of the preceding system embodiments wherein the data packages (10-18) comprise a technical teaching.
57. System according to any of the preceding system embodiments wherein the data packages (10-18) comprise a technical teaching in an enabling manner for a person skilled in the art.
58. System according to any of the preceding system embodiments wherein in the first approach is a first model and the second approach is a second model and the second model is different to the first model.
59. System according to the preceding system embodiment wherein the first approach profile comprises a numerical representation of the analysis of the originating data (2) and/or the selected data package(s) (10-12) according to the first model.
510. System according to the preceding system embodiment wherein the second approach profile comprises a numerical representation of the analysis of the originating data (2) and/or the selected data package(s) (10-12) according to the second model.
511. System according to any of the preceding system embodiments wherein the first approach comprises a content analytics model and the first approach profile comprises a first approach multidimensional representation.
512. System according to any of the preceding system embodiments wherein the second approach comprises a past distribution statistics model and the second approach profile comprises a second approach multidimensional representation. 513. System according to the two preceding system embodiments wherein the first and second approach multidimensional representations are vectorized.
514. System according to any of the preceding system embodiments wherein the analyzing component (3) and/or the retrieving component (3) is/are configured to retrieve data packages (10-18) from the database (4) on the basis on either one, two or all of the three models
515. System according to the preceding system embodiment wherein the analyzing component and/or retrieving component (3) are configured to weigh or gradually combine the three models on the basis of the last result of the selection component (5).
516. System according to any of the preceding embodiments wherein the analyzing component can be activated by a soft key for refreshing the analysis.
517. System according to any of the preceding embodiments wherein the analyzing component can be activated by a soft key for refreshing the analysis after the selecting component (5) has been activated.
518. System according to any of the preceding system embodiments wherein the database comprises more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages.
519. System according to any of the preceding system embodiments wherein the machine learning component (3) is configured to establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and to determine relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data (2).
520. System according to any of the preceding system embodiments wherein any retrieving of compatible data packages (10-18) makes the analyzing component (3) to analyze the originating data or any subsequent originating data on the basis of the first approach, the second approach and the classifiers.
521. System according to any of the preceding system embodiments wherein the data packages (10-18) each comprise at least one of datasets, callable units, metadata and content data and any combination thereof. 522. System according to any of the preceding system embodiments wherein the inputting component (1) is configured to input originating data (2) or grade data packages (10-18) automatically or semi-automatically.
523. System according to the preceding system embodiment wherein an inputting component (1) can activate the system upon entry of an individual password.
524. System according to any of the preceding system embodiments wherein the inputting component can repeatedly initiate the retrieving of compatible data packages (10- 18) any time on the most recent corresponding profile(s) in the analytics database (6).
525. System according to any of the preceding system embodiments further comprising a plurality of inputting components (1) that are configured or can be configured upon entry of a code to share data in the analytics database (6) assigned to the inputting components.
526. System according to any of the preceding system embodiments wherein the system is configured to repeat retrieving compatible data based on the profiles in the analytics database (6).
527. System according to the two preceding system embodiments wherein the system is configured to allow the repeating of retrieving compatible data by another inputting component (1) than the previous one.
528. System according to any of the preceding system embodiments wherein the originating data comprises metadata and content data.
529. System according to the preceding system embodiment wherein the originating data (2) comprises a code that is part of the metadata and wherein the database (4) comprises the other metadata and/or content data that corresponds to the code of the originating data (2).
530. System according to the preceding system embodiment wherein the system is configured to pull the other metadata and/or the content data from the database (4).
531. System according to any of the preceding system embodiments wherein the data packages each comprise metadata and content data. 532. System according to the preceding system embodiment wherein the features of the profiles correspond to portions of the metadata of the data packages (10-18).
533. System according to any of the preceding system embodiments wherein the inputting component (1) comprises a display.
534. System according to any of the preceding system embodiments wherein the inputting component (1) comprises a keyboard.
535. System according to any of the preceding system embodiments wherein the originating data (2) is provided by the inputting component in encoded fashion in a code and the analyzing component (3) is adapted to gather the complete originating data (2) from the database (4) based on that code.
536. System according to any of the preceding system embodiments wherein the code is an ID number, such as a document number, and the complete originating data (2) comprises a document.
537. System according to the preceding system embodiment wherein the complete originating data (2) comprises meta data and content data.
538. System according to any of the preceding system embodiments S12 to S21 wherein the metadata comprises more than 5 fields, preferably more than 10 fields, even more preferably more than 50 fields.
539. System according to any of the preceding system embodiments S12 to S22 wherein the metadata comprises less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields.
540. System according to any of the preceding system embodiments wherein the data packages (10-18) are patent, patent applications and/or utility models.
541. System according to any of the preceding system embodiments wherein the analyzing component (3) comprises a multidimensional analysis (MDA) tool.
542. System according to any of the preceding system embodiments wherein the first profile and/or the second profile is/are (a) vectorized profile(s) and the search in the general architecture is based on that vectorized profile(s).
543. System according to any of the preceding system embodiments wherein the analyzing component (3) and/or the retrieving component (3) is configured to provide the plurality of the data packages (10-18) sorted according to their relevance and/or weight and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2).
544. System according to any of the preceding system embodiments wherein the inputting component (1) is configured to display a representative summary of the plurality of the data packages (10-18) sorted according to their relevance and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2).
545. System according to any of the preceding system embodiments wherein the inputting component is configured to allow a de-selecting and/or weighing of the data packages (10-18) upon one or more clicks by the inputting component in pre-defined fields.
546. System according to any of the preceding system embodiments wherein the wherein the inputting component (1) is configured to display a representative summary of the plurality of the data packages (5-7) sorted according to their relevance and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2) and provides the option to out-select data packages (10-18) and/or to provide a quantitative value to their relevance.
547. System according to the preceding system embodiment wherein the quantitative value is from 0 to 3 or from 0 to 4.
548. System according to any of the two preceding system embodiments with a further value representing a de-selecting of a data package.
549. System according to any of the preceding system embodiments wherein a representative summary of the plurality of data packages comprises a plurality of metadata of the respective data packages (10-18).
550. System according to any of the preceding system embodiments wherein the inputting component is configured to allow an expansion of any parts of the data packages (10-18) upon a click by the inputting component (1).
551. System according to any of the preceding system embodiments further comprising a server (3) comprising the analyzing component (3a) and the retrieving component (3b).
552. System according to any of the preceding system embodiments further comprising a server (3,6) comprising the analytics database (6). 553. System according to the two preceding system embodiments wherein the server (3,6) comprises a bus (3d) constituting the communication interface.
554. System according to the preceding system embodiment wherein the bus is further constituting the communication interface with the inputting device and the database (4).
555. System according to any of the preceding system embodiments wherein the inputting component (1) comprises the selecting component (5).
556. System according to any of the preceding system embodiments wherein the system comprises a billing component for billing for the retrieving of data packages.
Below, method embodiments will be discussed. These embodiments are abbreviated by the letter "M" followed by a number. Whenever reference is herein made to "method embodiments", these embodiments are meant.
Ml. Method for retrieving compatible data packages to originating data (2), comprising the steps of: a. storing a plurality of data packages (10-18) in a database (4); b. inputting the originating data (2); c. analyzing the originating data (2) based on at least one first approach and generating a first approach first profile of the originating data (2); d. automatically approaching the database (4) and retrieving data packages (10- 18) from the database (4) according to the first approach first profile and providing a plurality of first data packages (10-12); e. selecting one or more preferred data package(s) (11,12) among the first data packages (10-12); and f. automatically analyzing the preferred data package(s) (10-12) according to the first approach and at least a second approach and generating a first approach second profile and a second approach first profile.
M2. Method for retrieving compatible data packages to originating data (2), comprising the steps of: g. storing a plurality of data packages (10-18) in a database (4); h. inputting the originating data (2); i. storing a plurality of profiles relevant for retrieving compatible data packages to originating data (2) in an analytics database (6); j. wherein the profiles comprise a plurality of features; k. conducting supervised machine learning and providing classifiers on the basis of the features of the profiles.
M3. Method comprising all steps of the preceding method embodiments.
M4. Method according to any of the preceding method embodiments with the further step of automatically analyzing the preferred data packages (10-18) according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component (3).
M5. Method according to any of the preceding method embodiment with the further step of repeatedly selecting data packages (13-15) and weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component (3) on the basis of the profiles and classifiers and providing an actual retrieving profile combining the weighted profiles and classifiers.
M6. Method according to the preceding method embodiment with the further step of automatically analyzing the preferred data packages (10-18) according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component (3).
M7. Method according to any of the preceding method embodiments with the further step of repeatedly selecting data packages weighing the profiles of the first and second approaches and the classifiers on the basis of the profiles and classifiers and providing an actual retrieving profile combining the weighted profiles and classifiers.
M8. Method according to any of the preceding method embodiments wherein in the first approach is a first model and the second approach is a second model and the second model is different to the first model.
M9. Method according to any of the preceding method embodiments wherein the first approach profile comprises a numerical representation of the analysis of the originating data (2) and/or the selected data package(s) (10-12) according to the first model.
M10. Method according to the preceding method embodiment wherein the second approach profile comprises a numerical representation of the analysis of the originating data (2) and/or the selected data package(s) (10-12) according to the second model. Mil. Method according to any of the preceding method embodiments wherein the first approach comprises a content analytics model and the first approach profile comprises a first approach multidimensional representation.
M12. Method according to any of the preceding method embodiments wherein the second approach comprises a past distribution statistics model and the second approach profile comprises a second approach multidimensional representation.
M13. Method according to the two preceding method embodiments wherein the first and second approach multidimensional representations are vectorized.
M14. Method according to any of the preceding method embodiments with the further step of retrieving data packages (10-18) from the database (4) on the basis on either one, two or all of the three models.
M15. Method according to the preceding method embodiment with the further step of weighing or gradually combining the three models on the basis of the last result of the selection component (5).
M16. Method according to any of the preceding method embodiments wherein the analyzing component can be activated by a soft key for refreshing the analysis.
M17. Method according to any of the preceding method with the further step of activating the analyzing by a soft key for refreshing the analysis after the selecting component (5) has been activated.
M18. Method according to the preceding method embodiment wherein the data packages each comprise at least one of datasets, callable units, metadata and content data and any combination thereof.
M19. Method according to any of the preceding method embodiments wherein the inputting of data is performed automatically or semi-automatically.
M20. Method according to any of the preceding method embodiments wherein the inputting of data is supported by a graphical user interface (GUI).
M21. Method according to any of the preceding method embodiments wherein the inputting of data is performed by initiating the pulling of data from a data base by entering a code. M22. Method according to any of the preceding method embodiments wherein the database comprises more than a million data packages, preferably more than 10 million data packages and most preferably more than 100 million data packages.
M23. Method according to any of the preceding method embodiments wherein the machine learning establishes classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and determining relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data (2).
M24. Method according to any of the preceding method embodiments wherein any retrieving of compatible data packages (10-18) makes the analyzing component (3) to analyze the originating data or any subsequent originating data on the basis of the first approach, the second approach and the classifiers.
M25. Method according to any of the preceding method embodiments wherein the data packages (10-18) each comprise at least one of datasets, callable units, metadata and content data and any combination thereof.
M26. Method according to any of the preceding method embodiments steps of inputting originating data (2) or grading data packages (10-18) automatically or semi- automatically.
M27. Method according to the preceding method embodiment wherein an inputting component (1) can initiate the method upon entry of an individual password.
M28. Method according to any of the preceding method embodiments with the step of repeating the initiating of the retrieving of compatible data packages (10-18) any time on the most recent corresponding profile(s) in the analytics database (6).
M29. Method according to any of the preceding method embodiments further comprising controlling a plurality of inputting components so that they are configured or can be configured upon entry of a code to share data in the analytics database (6) assigned to the inputting components.
M30. Method according to any of the preceding method embodiments with the step of repeating retrieving compatible data packages (10-18) based on the profiles in the analytics database (6). M31. Method according to the two preceding method embodiments with the step of repeating retrieving compatible data by another inputting component (1) than the previous one.
M32. Method according to any of the preceding method embodiments wherein the originating data comprises metadata and content data.
M33. Method according to the preceding method embodiment wherein the originating data (2) comprises a code that is part of the metadata and wherein the database (4) comprises the other metadata and/or content data that corresponds to the code of the originating data (2).
M34. Method according to the preceding method embodiment with the step of pulling the other metadata and/or the content data from the database (4) on the basis of the code.
M35. Method according to any of the preceding method embodiments wherein the data packages each comprise metadata and content data.
M36. Method according to the preceding method embodiment wherein the features of the profiles correspond to portions of the metadata of the data packages (10-18).
M37. Method according to any of the preceding method embodiments wherein the originating data (2) is provided by the inputting component in encoded fashion in a code and the analyzing component (3) is adapted to gather the complete originating data (2) from the database (4) based on that code.
M38. Method according to any of the preceding method embodiments wherein the code is an ID number, such as a document number, and the complete originating data (2) comprises a document.
M39. Method according to the preceding method embodiment wherein the complete originating data (2) comprises meta data and content data.
M40. Method according to any of the preceding method embodiments wherein the metadata comprises more than 5 fields, preferably more than 10 fields, even more preferably more than 50 fields.
M41. Method according to any of the preceding method embodiments S12 to S22 wherein the metadata comprises less than 100 fields, preferably less than 90 fields, even more preferably less than 70 fields. M42 Method according to any of the preceding method embodiments wherein the analyzing comprises a multidimensional analysis (MDA).
M43 Method according to any of the preceding method embodiments wherein the first profile and/or the second profile is/are (a) vectorized profile(s) and the search in the general architecture is based on that vectorized profile(s).
M44. Method according to any of the preceding method embodiments wherein the plurality of the data packages (10-18) are sorted according to their relevance and/or weight and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2).
M45. Method according to any of the preceding method embodiments with the step of displaying a representative summary of the plurality of the data packages (10-18) sorted according to their relevance and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2).
M46. Method according to any of the preceding method embodiments with the further step of de-selecting and/or weighing of the data packages (10-18) upon one or more clicks by the inputting component in pre-defined fields.
M47. Method according to any of the preceding method embodiments with the further step of displaying a representative summary of the plurality of the data packages (5-7) sorted according to their relevance and/or neighborhood and/or their similarity to the originating data (2) or most recent originating data (2) and providing the option to out select data packages (10-18) and/or to provide a quantitative value to their relevance.
M48. Method according to the preceding method embodiment wherein the quantitative value is from 0 to 3 or from 0 to 4.
M49. Method according to any of the two preceding method embodiments with a further value representing a de-selecting of a data package.
M50. Method according to any of the preceding method embodiments wherein a representative summary of the plurality of data packages comprises a plurality of metadata of the respective data packages (10-18). M51. Method according to any of the preceding method embodiments wherein the inputting component is configured to allow an expansion of any parts of the data packages (10-18) upon a click by the inputting component (1).
M52, Method according to any of the preceding method embodiments with the further step of billing for the retrieving of data packages.
Below, use embodiments will be discussed. These embodiments are abbreviated by the letter "U" followed by a number. Whenever reference is herein made to "use embodiments", these embodiments are meant.
Ul. Use of the system according to any of the preceding system embodiments for carrying out the method according to any of the preceding method embodiments.
Below, computer related product embodiments will be discussed. These embodiments are abbreviated by the letter "C" followed by a number. Whenever reference is herein made to "computer related product embodiments", these embodiments are meant.
Cl. A computer related product for carrying out the method according to any of the preceding method embodiments.
The present invention will now be described with reference to the accompanying drawings, which illustrate embodiments of the invention. These embodiments should only exemplify, but not limit, the present invention.
Figure Description
Fig. 1 schematically exemplifies an embodiment of a workflow between components according to the present invention;
Fig. 2 schematically exemplifies a more detailed embodiment of a rather early workflow between components according to the present invention;
Fig. 3 schematically exemplifies a more detailed embodiment of a later workflow than shown in Fig. 2 between components according to the present invention;
Fig. 4 schematically exemplifies a workflow of components and their control by different controlling models according to the present invention; and Fig. 5 schematically exemplifies an alternative workflow of components and their control by controlling models according to the present invention. Description of preferred embodiments as exemplified in the figures
It is noted that not all the drawings carry all the reference signs. Instead, in some of the drawings, some of the reference signs have been omitted for sake of brevity and simplicity of illustration. Embodiments of the present invention will now be described with reference to the accompanying drawings.
Fig. 1 schematically depicts an embodiment of a method and components of a respective system configured for carrying out the method. In particular there is an inputting component or a node 1 that can comprise a graphic user interface (GUI). The inputting component may be also controlled by a user or just displays to a user a progress. Originating data 2 is either inputted into the inputting component 1 or pulled or pushed into the inputting component 1. The inputting component can be a workstation, a computer and/or any kind of handheld computing devices, such as a laptop, tablet, smart phone etc.
In a next component or step the originating data 2 is then analyzed by an analyzing component 3 wherein the analysis is based on a first profile. The analyzing component 3 can be remote to the inputting component 1 and can be accessible in the same country of interest or in a different place or can be provided by what is called the cloud. The first profile can be an analysis of a task to be fulfilled by a software routine or sub-routine and/or a certain document. The analysis is then allocating routines or sub-routines as data packages that are available in a library that are able to fulfill the task. Alternatively, the content of the originating data may be analyzed according to different parts of its content. This could be done by a stepwise and then aggregated analytics process represented by a multi-modal vector.
After the first analysis a database or library 4 will be approached and first related or neighboring data packages 10-12 will be automatically determined on the basis of the first profile. The data packages can be routines or sub-routines that are or appear to be capable to fulfill a certain task. The data packages can also be any other callable unit, such as a procedure, a function, a routine, a method, or a subprogram. The data packages can also be aggregated data or datasets, representing a certain content of a document.
The analysis can be done locally or remotely. If it is done locally, the analyzing component can be configured to do so. Alternatively, it could send an order to or parts of an order to analyze to a remote server or to the cloud. The data packages 10-12 should be compatible or matching to the originating data 2. This comprises a certain pre-defined or actualized similarity or neighborhood profile.
In order to protocol the progress of the method, the steps can be tracked, protocolized or stored in a analytics database 6. This should enable to follow and reconstruct the sequence of steps, intermediate and final results. This can also serve in order to share the progress or to merge the method with findings of others. While a complete tracking takes place also an overwriting of rather old steps and findings can take place for data efficiency reasons.
Those data packages 10-12 are then again analyzed by a selection or choice-by- elimination. The analysis is then based on another profile than the first profile. This could be a holistic profile taking into consideration all aspects of the originating data or even more. This can also just apply a further profile. In the example above, the run-time and/or precision etc. of a routine. This can be performed in or by a selecting component 5 coarsely or more finely granulated by a score card approach.
The respective result is then fed into the analyzing component 3 that will actualize the analysis according to the selection in or by the selecting component. In Fig. 1 the small circle to the left next to the selecting component 5 means a dead end, i.e., data package 10 in that case will be sorted out or considered incompatible. The other data packages 11 and 12 can then be classified automatically. The classification can in any case be also on a marking system. The marking system can be made simple and may categorize into 2 or 4 classes only.
The analyzing component 3 is further configured to automatically analyze the preferred data package(s) (11,12) according to the first profile and now also a second profile that is different to the first profile. It will be analyzed which criteria or parameters or any combination thereof is fulfilled or close to be fulfilled by the preferred data packages 11, 12.
The actualized first and second profiles, any core information therefrom or other metadata will be also stored in an analytics database 6. The previous profile can be either over written or the new profiles can be stored additionally for documentation purposes.
Fig. 2 exemplifies an embodiment of the analyzing component 3. It can comprise a CPU or any other controller 3a that is in communication with a memory 3c. The analyzing component can also comprise a retrieving component 3b that is configured to approach an external entity, such as the database 4. Moreover, the inputting device 1, the database 4 as well as the analytics database 4 can be accessible by a communication interface, such as a bus 3d. Both or one of the databases 4 and 6 can be located locally or remotely. In the example shown, the inputting device 1 can be located remotely from the analyzing component 3. The same holds true for the database 4 and the analytics database 6, wherein all or any two of those components can be located also remotely from each other.
Fig. 3 shows in addition to Fig. 2 that either in a more progressed state or in another embodiment the selecting component 1 is also affiliated ed to the inputting component 1 or can even be integrated in the same device as in a terminal, computer, laptop, handheld etc.
Even further the analyzing component with all elements 3a-3d and the analytics database can be integrated or located non-remotely. All elements can be connected by one or more bus 3d. They can also be integrated in a server or cloud arrangement. The server 3 can comprise the analyzing component 3a and the retrieving component 3b and the analytics database 6. In the example shown, the server 3,6 comprises a bus 3d constituting the communication interface.
The inputting component 1 can also comprise the selecting component.
Fig. 4 exemplifies in more detail how and when the different approaches or models are involved. After a start of the system or respective method or workflow between the components the first profile is approached by the analytics component 3. In the example shown it is one profile but it can already initially also be two or more. In any case the analyzing and/or retrieving component 3 then delivers the three data packages 10 to 12. In the next step of the workflow the selecting component 5 is supplied with information or the data or the metadata of the data packages 10 to 12. In the example shown, the respective selection which then takes place, namely the selection of data packages 11 and 12 according to the example shown in Fig. 1 are then delivered to the first profile 30 and a second profile 31.
The analyzing and/or retrieving component 3 is then coordinating the further analysis of the two remaining data packages 11 and 12 according to the first model 30 and the second model 31. These profiles 30, 31 can work independent from each other but it can also happen that they coordinate their analytics. Anyhow, the analyzing component 3 then delivers the outcome of the either independent or combined analytics according to both models 30 and 31 and delivers data packages 13 to 15. The first model 30 can be a highly complex and multimodal profile that is analyzing the content of data packages 11 and 12 and retrieving a pattern in this content. This pattern can then be represented as a multidimension value. Such values can have many dimensions, even more than hundred thousand or millions of dimensions, each representing an occurrence of a certain value or term. Model 31 can be a statistical profile that for example is already fed with a statistical distribution of certain phenomena in data packages and tries to allocate an already known statistical pattern in documents 11 and 12 or the most similar one.
In a third stage shown in figure 4, the documents 13 to 15 or their content is fed to selecting component 5 that is again then selecting relevant from nonrelevant data packages. In the example shown in figure 1 again where a package 10 was out selected and data packages 14 and 15 were found to be compatible. The grade of compatibility may be also differentiated with a numerical value, let's say from 1 to 3, wherein 1 this the least compatible data package and 3 means the most compatible data package. Also a numerical value for the non-relevant other package may be given, such as the value zero. In the example shown earlier, the data packages 10 and 13 have received the value zero.
Now carrying on with the description in the third stage of figure 4; after the selection in selecting component 5 the information (that may include the content and/or the metadata) is then classified with a machine learning approach or model. In that machine learning model 32 a classification may take place on the basis of random decision trees. In case there is a larger amount of metadata and respective features the last stage will then deliver by a third profile and the analyzing component 3 three data packages 16 to 18 that in the embodiment shown should be the most relevant data packages of all.
With this training information the analyzing component 3 is further trained to train the analytics and searching model. This typically leads to a more sophisticated or differentiated model. The analyzing and/or retrieving component 3 then again approaches the database 4. The analyzing component 3 will then automatically search in the database 4 based on the first and second profiles to provide a plurality of second data packages 13-15.
In the same or another selecting component 5 the most compatible data packages will be selected or eliminated or differentially classified. They may be ranked as well according to their similarity or neighborhood value. With this data the retrieving component 3 again approaches the database and stores the profiles in the analytics database 6. This will then provide further data packages 16-18. The analyzing and searching step of component 3 can also be configured to additionally approach another third profile that can be a further classifying tool. This can be the random forest tool.
This process can be performed continuously or can be stopped. It can also be automatically stop in case the data packages 16-18 fulfill a pre-defined or actualized quality criteria. In that case the profiles will be stored and will be available for further methods for the same or similar originating data.
Fig. 5 depicts another option. Most parts are the same or similar to Fig. 4. However, in the last stage, the three models as described above will be applied and combined. The grade of combination of each model is also subject of change by the retrieving component 3 and depends on the analysis of the selection that has taken place. In selecting component 5.
Reference numbers and letters appearing between parentheses in the claims, identifying features described in the embodiments and illustrated in the accompanying drawings, are provided as an aid to the reader as an exemplification of the matter claimed. The inclusion of such reference numbers and letters is not to be interpreted as placing any limitations on the scope of the claims.
The term "at least one of a first option and a second option" is intended to mean the first option or the second option or the first option and the second option.
Whenever a relative term, such as "about", "substantially" or "approximately" is used in this specification, such a term should also be construed to also include the exact term. That is, e.g., "substantially straight" should be construed to also include "(exactly) straight".
Whenever steps were recited in the above or also in the appended claims, it should be noted that the order in which the steps are recited in this text may be accidental. That is, unless otherwise specified or unless clear to the skilled person, the order in which steps are recited may be accidental. That is, when the present document states, e.g., that a method comprises steps (A) and (B), this does not necessarily mean that step (A) precedes step (B), but it is also possible that step (A) is performed (at least partly) simultaneously with step (B) or that step (B) precedes step (A). Furthermore, when a step (X) is said to precede another step (Z), this does not imply that there is no step between steps (X) and (Z). That is, step (X) preceding step (Z) encompasses the situation that step (X) is performed directly before step (Z), but also the situation that (X) is performed before one or more steps (Yl), followed by step (Z). Corresponding considerations apply when terms like "after" or "before" are used.
List of reference numerals used
1 - inputting component
2 - originating data
3 - analyzing component
3a - CPU 3b - retrieving component
3c - memory 3d - bus
3c - communication interface
4 - database 5 - selecting component
6 - analytics database 10-18 data packages
30 - first profile
31 - second profile
32 - third profile

Claims

Claims
1. System for retrieving compatible data packages to originating data (2), comprising: a. a database (4) comprising a plurality of data packages (10-18); b. an inputting component (1) configured to input the originating data (2); c. an analyzing component (3) configured to analyze the originating data (2) based on at least one first approach and to generate a first approach first profile of the originating data (2); d. a retrieving component (3) configured to automatically approach the database (4) and to retrieve data packages (10-18) from the database (4) according to the first approach first profile and to provide a plurality of first data packages (10-12); e. a selecting component (5) for selecting one or more preferred data package(s) (11,12) among the first data packages (10-12); and f. the analyzing component (3) being further configured to automatically analyze the preferred data package(s) (11,12) according to the first approach and at least a second approach and to generate a first approach second profile and a second approach first profile.
2. System for retrieving compatible data packages to originating data (2), comprising: a. a database (4) comprising a plurality of data packages (10-18); b. an inputting component (1) configured to input the originating data (2); c. an analytics database (6) comprising a plurality of profiles relevant for retrieving compatible data packages to originating data (2); d. wherein the profiles comprise a plurality of features; e. a supervised machine learning component (3) configured to provide classifiers on the basis of the features of the profiles.
3. System according to all preceding system claims.
4. System according to the preceding system claim wherein the analyzing component (3) is further configured to automatically analyze the preferred data packages (10- 18) according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component (3).
5. System according to the preceding system claim wherein the selecting component (5) is repeatedly selecting data packages (13-15) and the analyzing component (3) is weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component (3) on the basis of the profiles and classifiers and is providing an actual retrieving profile combining the weighted profiles and classifiers.
6. System according to any of the preceding system claims wherein in the first approach is a first model and the second approach is a second model and the second model is different to the first model.
7. System according to any of the preceding system claims wherein the analyzing component (3) and/or the retrieving component (3) is/are configured to retrieve data packages (10-18) from the database (4) on the basis on either one, two or all of the three models
8. System according to the preceding system claim wherein the analyzing component and/or retrieving component (3) are configured to weigh or gradually combine the three models on the basis of the last result of the selection component (5).
9. System according to any of the preceding system claims wherein the machine learning component (3) is configured to establish classifiers on the basis of the profiles by forming random decision trees on the basis of the features of the profiles and to determine relevant decision trees from less relevant decision trees on the basis of profiles relevant for retrieving compatible data packages to originating data (2).
10. Method for retrieving compatible data packages to originating data (2), comprising the steps of: a. storing a plurality of data packages (10-18) in a database (4); b. inputting the originating data (2); c. analyzing the originating data (2) based on at least one first approach and generating a first approach first profile of the originating data (2); d. automatically approaching the database (4) and retrieving data packages (10-18) from the database (4) according to the first approach first profile and providing a plurality of first data packages (10-12); e. selecting one or more preferred data package(s) (11,12) among the first data packages (10-12); and f. automatically analyzing the preferred data package(s) (10-12) according to the first approach and at least a second approach and generating a first approach second profile and a second approach first profile.
11. Method for retrieving compatible data packages to originating data (2), comprising the steps of: a. storing a plurality of data packages (10-18) in a database (4); b. inputting the originating data (2); c. storing a plurality of profiles relevant for retrieving compatible data packages to originating data (2) in an analytics database (6); d. wherein the profiles comprise a plurality of features; e. conducting supervised machine learning and providing classifiers on the basis of the features of the profiles.
12. Method comprising all steps of the preceding method claims.
13. Method according to any of the preceding method claims with the further step of automatically analyzing the preferred data packages (10-18) according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component (3).
14. Method according to any of the preceding method claim with the further step of repeatedly selecting data packages (13-15) and weighing the profiles of the first and second approaches and the classifiers from the supervised machine learning component (3) on the basis of the profiles and classifiers and providing an actual retrieving profile combining the weighted profiles and classifiers.
15. Method according to the preceding method claim with the further step of automatically analyzing the preferred data packages (10-18) according to the profiles of the first and second approaches and the classifiers of the supervised machine learning component (3).
16. Method according to any of the preceding method claims with the further step of repeatedly selecting data packages weighing the profiles of the first and second approaches and the classifiers on the basis of the profiles and classifiers and providing an actual retrieving profile combining the weighted profiles and classifiers.
17. Method according to any of the preceding method claims wherein in the first approach is a first model and the second approach is a second model and the second model is different to the first model.
18. Use of the system according to any of the preceding system claims for carrying out the method according to any of the preceding method claims.
19. A computer related product for carrying out the method according to any of the preceding method embodiments.
PCT/EP2020/086829 2020-01-08 2020-12-17 System and method for retrieving compatible data packages to originating data WO2021140000A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP20150753 2020-01-08
EP20150753.0 2020-01-08

Publications (1)

Publication Number Publication Date
WO2021140000A1 true WO2021140000A1 (en) 2021-07-15

Family

ID=69156202

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/086829 WO2021140000A1 (en) 2020-01-08 2020-12-17 System and method for retrieving compatible data packages to originating data

Country Status (1)

Country Link
WO (1) WO2021140000A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7945348B2 (en) 2003-11-10 2011-05-17 Brooks Automation, Inc. Methods and systems for controlling a semiconductor fabrication process
US20170232515A1 (en) 2016-02-01 2017-08-17 Seurat Technologies, Inc. Additive Manufacturing Simulation System And Method
US9849019B2 (en) 2012-09-21 2017-12-26 Conformis, Inc. Methods and systems for optimizing design and manufacture of implant components using solid freeform fabrication

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7945348B2 (en) 2003-11-10 2011-05-17 Brooks Automation, Inc. Methods and systems for controlling a semiconductor fabrication process
US9849019B2 (en) 2012-09-21 2017-12-26 Conformis, Inc. Methods and systems for optimizing design and manufacture of implant components using solid freeform fabrication
US20170232515A1 (en) 2016-02-01 2017-08-17 Seurat Technologies, Inc. Additive Manufacturing Simulation System And Method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HISAO MASE ET AL: "Proposal of two-stage patent retrieval method considering the claim structure", ACM TRANSACTIONS ON ASIAN LANGUAGE INFORMATION PROCESSING, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, NY, US, vol. 4, no. 2, 1 June 2005 (2005-06-01), pages 190 - 206, XP058179180, ISSN: 1530-0226, DOI: 10.1145/1105696.1105702 *

Similar Documents

Publication Publication Date Title
CN107122375B (en) Image subject identification method based on image features
US8239336B2 (en) Data processing using restricted boltzmann machines
CN109446927B (en) Double-person interaction behavior identification method based on priori knowledge
CN106203395A (en) Face character recognition methods based on the study of the multitask degree of depth
CN109670546B (en) Commodity matching and quantity regression recognition algorithm based on preset template
CN104246656A (en) Automatic detection of suggested video edits
CN110610193A (en) Method and device for processing labeled data
JP7405775B2 (en) Computer-implemented estimating methods, estimating devices, electronic equipment and storage media
WO2005091207A1 (en) System and method for patient identification for clinical trials using content-based retrieval and learning
JP2022548160A (en) Preparing training datasets using machine learning algorithms
US11886779B2 (en) Accelerated simulation setup process using prior knowledge extraction for problem matching
US20220366244A1 (en) Modeling Human Behavior in Work Environment Using Neural Networks
CN110781970A (en) Method, device and equipment for generating classifier and storage medium
CN114913923A (en) Cell type identification method aiming at open sequencing data of single cell chromatin
CN110717090A (en) Network public praise evaluation method and system for scenic spots and electronic equipment
CN110737805A (en) Method and device for processing graph model data and terminal equipment
WO2022063076A1 (en) Adversarial example identification method and apparatus
Mashuri et al. Smart Victims Detection in Natural Disaster using Deep Learning
US11550830B2 (en) Systems and methods for multi-source reference class identification, base rate calculation, and prediction
WO2021140000A1 (en) System and method for retrieving compatible data packages to originating data
CN113419951B (en) Artificial intelligent model optimization method and device, electronic equipment and storage medium
WO2020167156A1 (en) Method for debugging a trained recurrent neural network
Chan et al. Modelling breaks and clusters in the steady states of macroeconomic variables
Beykal et al. Data-driven Stochastic Optimization of Numerically Infeasible Differential Algebraic Equations: An Application to the Steam Cracking Process
Firdous Handwritten Character Recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20824281

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20824281

Country of ref document: EP

Kind code of ref document: A1