CN111104476B - Archive data generation method, archive data generation device, and readable storage medium - Google Patents

Archive data generation method, archive data generation device, and readable storage medium Download PDF

Info

Publication number
CN111104476B
CN111104476B CN201911314535.5A CN201911314535A CN111104476B CN 111104476 B CN111104476 B CN 111104476B CN 201911314535 A CN201911314535 A CN 201911314535A CN 111104476 B CN111104476 B CN 111104476B
Authority
CN
China
Prior art keywords
data
exogenous
archive
sentence
characteristic value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911314535.5A
Other languages
Chinese (zh)
Other versions
CN111104476A (en
Inventor
张跃鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yonyou Network Technology Co Ltd
Original Assignee
Yonyou Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yonyou Network Technology Co Ltd filed Critical Yonyou Network Technology Co Ltd
Priority to CN201911314535.5A priority Critical patent/CN111104476B/en
Publication of CN111104476A publication Critical patent/CN111104476A/en
Application granted granted Critical
Publication of CN111104476B publication Critical patent/CN111104476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a archive data generation method, an archive data generation device and a readable storage medium, comprising the following steps: acquiring at least one exogenous data packet, classifying the at least one exogenous data packet according to a preset data classification model, and generating at least one exogenous data combination; filtering at least one exogenous data combination meeting a first matching condition according to a first characteristic value in the archive query instruction, and generating at least one result data set; according to at least one self-defined characteristic value in the file generation instruction, the exogenous data packet meeting the second matching condition is selected from at least one result data set to generate file data, so that a set of basic file data meeting the industry characteristics is quickly generated, the implementation period of software is greatly shortened, the labor cost is greatly reduced, and the project implementation progress is quickened.

Description

Archive data generation method, archive data generation device, and readable storage medium
Technical Field
The present invention relates to the field of archive data development, and in particular, to an archive data generation method, an archive data generation device, and a computer readable storage medium.
Background
Because of the complexity of ERP software and the diversity of data in various industries, a great deal of labor cost and time cost are often spent for system initialization at the initial stage of installing and deploying software. Particularly, for the basic data of the most basic layer of the archive data system, all business data are built on the basic archive data, and the archive data are various and large in quantity, so that the input work is complicated and the progress is slow, and the implementation period is overlong.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art.
To this end, a first aspect of the present invention provides a archive data generation method.
A second aspect of the present invention provides an archive data generation device.
A third aspect of the present invention provides a computer-readable storage medium.
In view of this, according to a first aspect of the present invention, there is provided an archive data generation method including: acquiring at least one exogenous data packet, classifying the at least one exogenous data packet according to a preset data classification model, and generating at least one exogenous data combination; filtering at least one exogenous data combination meeting a first matching condition according to a first characteristic value in the archive query instruction, and generating at least one result data set; and selecting an exogenous data packet meeting a second matching condition from at least one result data set according to at least one self-defined characteristic value in the file generation instruction, and generating file data.
According to the archive data generation method provided by the embodiment of the invention, the plurality of exogenous data packets are obtained, and the data in the exogenous data packets are sorted and classified according to the preset data sorting model to generate a plurality of exogenous data combinations, so that the problem that the search speed of a search engine on documents is too slow due to various types of document archives stored in the exogenous data packets is avoided, and the speed of generating a set of archive data meeting the industry standard is reduced. According to a first characteristic value in the file query instruction, a plurality of exogenous data combinations meeting a first matching condition are filtered from a plurality of exogenous data packets, and a plurality of result data sets are generated, so that the system automatically matches and screens the plurality of result data sets meeting the query condition, the screening range is further reduced, and the filtering speed of the subsequent secondary screening of a user is improved; according to the plurality of self-defined characteristic values in the file generation instruction, the characteristic labels of all result data sets can be reflected by the self-defined characteristic values, the exogenous data meeting the second matching condition is selected from the plurality of result data sets, and the exogenous data are integrated into an exogenous data packet to generate file data once, so that all business data are prevented from being built on basic file data, the workload of file entry is improved, the implementation period of software is further shortened, and the integration speed of generating a set of file data meeting industry standards is improved. According to the method, after the exogenous data are sorted, the exogenous data are subjected to characteristic value description based on the exogenous data, and the exogenous data containing the characteristic value description are subjected to sorting recommendation for a user to perform secondary brushing, so that a set of basic file data conforming to the industry characteristics is rapidly generated, the implementation period of software is greatly shortened, the labor cost is greatly reduced, and the project implementation progress is accelerated.
In addition, the archive data generation method provided by the technical scheme of the invention also has the following additional technical characteristics:
in the above technical solution, further, filtering at least one of the exogenous data combinations that meets a first matching condition according to a first feature value in the archive query instruction, and generating at least one result data set, including: setting an industry matching rule of the first matching condition; extracting fixed features in each exogenous data combination, and acquiring industries to which the fixed features belong according to the fixed features; judging that the industries and the first characteristic values conform to the matching rules of the industries, and filtering out exogenous data combinations corresponding to the first characteristic values according to the judging result; and creating a recommendation list, sequentially storing at least one exogenous data combination in the recommendation list, and generating at least one result data set.
In this technical solution, filtering at least one of the exogenous data combinations that meets a first matching condition according to a first feature value in an archive query instruction, and generating at least one result data set, specifically including: the method comprises the steps of setting a matching rule of the industries related to a first matching condition in advance, extracting fixed features in each exogenous data combination, acquiring the industries according to the fixed features because the fixed features contain the industries, carrying out matching judgment on the industries, a first characteristic value and the matching rule of the industries, and filtering exogenous data combinations corresponding to the first characteristic value according to a judgment result, so that basic file data conforming to the first matching condition is quickly searched based on the industries, the screening range is further reduced, and a user can conveniently carry out secondary brushing on the subsequent parts; through newly creating a recommendation list, a plurality of exogenous data combinations are sequentially stored in the recommendation list, so that the recommendation list is divided into a plurality of areas, and a plurality of result data sets are generated, thereby being capable of facilitating users to autonomously select required service data according to the recommendation list, avoiding that all service data are built on basic archive data, and the archive data are various in variety and large in quantity, so that the input work is complicated, the progress is slow, and the implementation period is prolonged.
In the above technical solution, further, according to at least one custom feature value in the archive generating instruction, selecting an exogenous data packet meeting a second matching condition from at least one result data set to generate archive data, including: setting a characteristic value multiple selection rule of a second matching condition; selecting at least one result data set associated with the self-defined characteristic value according to the characteristic value multiple selection rules; and integrating the exogenous data packet into archive data according to the result data set.
According to the technical scheme, according to at least one self-defined characteristic value in a file generation instruction, an exogenous data packet meeting a second matching condition is selected from a plurality of result data sets to generate file data, and the method specifically comprises the following steps: the characteristic value multiple selection rule of the second matching condition is set in advance, multiple result data sets associated with the self-defined characteristic value are selected according to the characteristic value multiple selection rule, the selected result data sets are integrated and combined into an exogenous data packet, and the exogenous data packet is imported into the generated file data once, so that a set of file data meeting the industry standard is generated quickly, the implementation period of software is shortened greatly, the labor cost is reduced greatly, and the project implementation progress is accelerated.
In the above technical solution, further, obtaining at least one external source data packet, classifying the at least one external source data packet according to a data classification model, and generating at least one external source data combination, which specifically includes: acquiring an exogenous document in any exogenous data packet, and segmenting exogenous sentences in the exogenous document according to word segmentation recognition rules in a data classification model; performing characteristic value fitting on the exogenous sentences according to characteristic value description rules in the data classification model to generate fixed characteristics and dynamic characteristics corresponding to the exogenous sentences; classifying the fixed features containing the same industries according to the industries included in the fixed features according to index arrangement rules in the data classification model, and establishing a unique main index corresponding to the industries for at least one classified fixed feature; establishing at least one sub-index in each fixed feature, and adding a unique identifier in the sub-index into the dynamic feature, so that each fixed feature and at least one dynamic feature map generate an index tree; traversing and confirming that the sub-indexes in the fixed features have dynamic features, and generating exogenous data combinations corresponding to index columns in an index tree.
In the technical scheme, a plurality of exogenous data packets are acquired, classified according to a data classification model, and a plurality of exogenous data combinations are generated, and specifically include: the method comprises the steps of obtaining an exogenous document in an exogenous data packet, carrying out word segmentation recognition and segmentation processing on exogenous sentences in the exogenous document, and then carrying out characteristic value fitting on the exogenous sentences according to characteristic value description rules in a data classification model to generate fixed characteristics and dynamic characteristics corresponding to the exogenous sentences; according to index arrangement rules in a data classification model, each fixed feature containing the same industry is selected and classified, a unique main index with a one-to-one mapping relation with the industry is established on the basis of the classified fixed features, when the index is used for searching and inquiring, the fixed features of the industry can be quickly inquired through the unique main index, a plurality of sub-indexes are established in each fixed feature, and as each fixed feature and the dynamic feature belong to one-to-many relation, unique identifiers in the sub-indexes can be added to the dynamic feature, so that an index tree related to the fixed features and the dynamic features is formed by mapping between the main index and the sub-indexes, and the traversing inquiring speed is improved; and finally, traversing and confirming whether the sub-index in the fixed features has dynamic features, and if so, indicating that the exogenous data combination comprising the index columns of the fixed features and the dynamic features can be generated, thereby improving the query speed, further improving the processing speed of quickly generating a set of archive data meeting the industry standard, and greatly shortening the implementation period of software.
In the above technical solution, further, according to a feature value description rule in the data classification model, performing feature value fitting on the external source sentence to generate a fixed feature and a dynamic feature corresponding to the external source sentence, which specifically includes: setting a fixed characteristic value fitting standard according to the using heat of a user; traversing and confirming that the external index times in the exogenous sentence are larger than the heat standard value in the fixed characteristic value fitting standard, and setting the priority of the exogenous sentence as high; confirming that the external index times in the exogenous sentence are equal to the heat standard value, and setting the priority of the exogenous sentence as a middle level; confirming that the external index times in the exogenous sentence is smaller than the heat standard value, and setting the priority of the exogenous sentence as low; and identifying keywords in the exogenous sentences with the set priorities, judging data sources and industries corresponding to the exogenous sentences according to the keywords, marking judging results in the exogenous sentences, and generating fixed features corresponding to the exogenous sentences.
In the technical scheme, according to a characteristic value description rule in a data classification model, characteristic value fitting is carried out on an exogenous sentence, and a fixed characteristic and a dynamic characteristic corresponding to the exogenous sentence are generated, which concretely comprises the following steps: the method comprises the steps of setting a fixed characteristic value fitting standard related to the use heat degree of a user in advance, determining the priority of each exogenous sentence in a mode of comparing with the fitting standard by traversing a plurality of exogenous sentences and confirming the external index times in the exogenous sentences, identifying keywords in each exogenous sentence, distinguishing the data source and the belonging industry of each exogenous sentence in a judging mode, marking the judging result in the exogenous sentences, and enabling each exogenous sentence to generate corresponding fixed characteristics, so that the classification of exogenous data is finer.
In the above technical solution, further, according to a feature value description rule in the data classification model, performing feature value fitting on the external source sentence to generate a fixed feature and a dynamic feature corresponding to the external source sentence, and further including: setting a specific scene and a specific use in the characteristic value description rule according to the application scene; cutting key word segmentation in the exogenous sentence, simulating a use scene of a word segmentation field, and generating at least one personalized description corresponding to the exogenous sentence; and matching a specific scene and a specific application according to the personalized description, and adding the specific scene, the specific application and the personalized description in the dynamic buffer zone to generate dynamic characteristics corresponding to the dynamic buffer zone.
In the technical scheme, according to a characteristic value description rule in a data classification model, characteristic value fitting is performed on an exogenous sentence, and a fixed characteristic and a dynamic characteristic corresponding to the exogenous sentence are generated, and the method further comprises the following steps: setting a plurality of specific scenes and specific uses in the characteristic value description rule, wherein the setting can be performed according to the application of the scenes; the method comprises the steps of cutting key word segmentation in an exogenous sentence, simulating a use scene of the word segmentation field, generating personalized description with a corresponding relation with the exogenous sentence, matching a specific scene and a specific application which accord with the personalized description in a preset characteristic value description rule according to the personalized description, adding the specific scene, the specific application and the personalized description in a dynamic buffer zone, and generating corresponding dynamic characteristics through the dynamic buffer zone due to the corresponding relation between the dynamic buffer zone and the dynamic characteristics, so that the description of various exogenous data is finer and the application range is wider.
In the above technical solution, further, before obtaining at least one exogenous data packet, the method specifically includes: setting at least one exogenous category according to the category of the archive data; acquiring archive data according to the data collection interface; and classifying and storing the archive data according to the exogenous category so that the archive data collection is stored in the corresponding exogenous category.
In this technical scheme, before obtaining a plurality of exogenous data packets, specifically include: setting the types of the file data respectively to generate a plurality of exogenous types for describing the data sources of the file; and then according to the data collection interface, acquiring the archive data, and classifying and storing the archive data according to the preset exogenous category, so that the archive data is classified and integrated and stored according to the exogenous category, and the acquisition of other required data is realized, so that the expandability is realized.
According to a second aspect of the present invention, there is provided an archive data generation device comprising a memory configured to store a computer program and a processor; the processor is configured to execute a computer program to implement the steps of the archive data generation method according to any one of the above embodiments, so that all the advantageous technical effects of the archive data generation method are provided, and are not described in detail herein.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the archive data generation method according to any one of the above aspects, thereby having all the advantageous technical effects of the archive data generation method, which will not be described in detail herein.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 shows a schematic flow chart of a archive data generation method of one embodiment of the invention;
FIG. 2 is a schematic flow chart diagram of a archive data generation method of another embodiment of the present invention;
FIG. 3 shows a schematic flow chart of a archive data generation method of yet another embodiment of the present invention;
FIG. 4 is a schematic flow chart diagram of a archive data generation method of yet another embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram illustrating a archive data generation method in accordance with another embodiment of the present invention;
FIG. 6 shows a schematic flow chart diagram of a archive data generation method of yet another embodiment of the present invention;
FIG. 7 is a schematic flow chart diagram illustrating a archive data generation method in accordance with yet another embodiment of the present invention;
FIG. 8 is a schematic flow chart diagram illustrating an archive data generation device of yet another embodiment of the present invention;
FIG. 9 is a schematic flow chart diagram illustrating an archive data generation device of yet another embodiment of the present invention;
fig. 10 is a schematic flow chart of an archive data generation device of a further embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Archive data generation methods, archive data generation apparatuses, and computer-readable storage media according to some embodiments of the present invention are described below with reference to fig. 1 to 10.
Embodiment one:
an embodiment of a first aspect of the present invention provides a method for generating archive data.
Details about the archive data generation method are specifically set forth.
Hereinafter, embodiments of the present application will be specifically described using ERP management software as an application scenario.
FIG. 1 illustrates a schematic flow diagram of a database middleware-based partitioning method of one embodiment of the present application.
As shown in fig. 1, the archive data generation method includes:
step S102, at least one external source data packet is obtained, and the at least one external source data packet is classified according to a preset data classification model to generate at least one external source data combination;
step S104, filtering at least one exogenous data combination meeting a first matching condition according to a first characteristic value in the archive query instruction, and generating at least one result data set;
step S106, selecting an exogenous data packet meeting the second matching condition from at least one result data set according to at least one self-defined characteristic value in the file generation instruction, and generating file data.
The archive data generation method provided by the embodiment of the invention firstly collects various exogenous data, such as international standard class, national standard class and industry standard class, and can be understood that the collection of the exogenous data is realized by utilizing the configured data collection interface; secondly, setting a specific rule algorithm in the data classification model so as to realize the arrangement, processing and classification treatment of the external data; the classified data are recommended for users to carry out secondary screening, and the screened exogenous data are integrated and imported into the system at one time, so that a set of basic file data meeting the industry characteristics is rapidly generated, and the problems of complicated input work and slow progress caused by various types and large quantity of file data on the basis of all business data construction and the basic file data are avoided, and the ERP implementation period is overlong.
Fig. 2 shows another schematic flow chart diagram of a archive data generation method of an embodiment of the present application.
As shown in fig. 2, filtering at least one exogenous data combination meeting a first matching condition according to a first characteristic value in the archive query instruction, and generating at least one result data set, which specifically includes:
step S202, setting an industry matching rule of a first matching condition;
step S204, extracting fixed features in each exogenous data combination, and acquiring industries of the fixed features according to the fixed features;
step S206, judging that the belonging industry and the first characteristic value accord with the belonging industry matching rule, and filtering out exogenous data combination corresponding to the first characteristic value according to the judging result;
step S208, a recommendation list is newly established, at least one exogenous data combination is sequentially stored in the recommendation list, and at least one result data set is generated.
In this embodiment, a user can input an industry to which the user belongs in a query box of an archive generating interface of the ERP system, and the system automatically matches and filters a plurality of exogenous data combinations conforming to a first matching condition, wherein the first matching condition is that query information input by the user is transmitted into a data receiving layer, after the data is received, the exogenous data combinations related to input instruction information are searched and matched from fixed features according to the query information, and specific data content and feature descriptions contained in various data of the user are displayed.
Fig. 3 shows yet another schematic flow chart of a archive data generation method of an embodiment of the present application.
As shown in fig. 3, the archive data generation method further includes:
step S302, setting a characteristic value multiple selection rule of a second matching condition;
step S304, selecting at least one result data set associated with the self-defined characteristic value according to the characteristic value multiple selection rules;
step S306, integrating the external data packet into file data according to the result data set.
In this embodiment, when a user uses the function of quickly generating archive data in ERP software, the user needs to input the industry first, and inquire about the fixed features and dynamic features input by the user, that is, the system automatically matches the feature values of all data to form a set of data set; and the characteristic values are displayed in different regions, each type of data comprises specific data content and characteristic description, and a user can perform secondary screening, namely, select material data needed to be used, integrate the data selected for multiple times and import the data into the system at one time.
Fig. 4 shows another schematic flow chart of a archive data generation method of an embodiment of the present application.
As shown in fig. 4, the archive data generation method further includes:
step S402, obtaining at least one external source data packet, classifying the at least one external source data packet according to a preset data classification model, and generating at least one external source data combination;
step S404, setting an industry matching rule of the first matching condition;
step S406, extracting fixed features in each exogenous data combination, and acquiring industries of the fixed features according to the fixed features;
step S408, judging that the belonging industry and the first characteristic value accord with the belonging industry matching rule, and filtering out exogenous data combination corresponding to the first characteristic value according to the judging result;
step S410, newly creating a recommendation list, sequentially storing at least one exogenous data combination in the recommendation list, and generating at least one result data set;
step S412, setting a characteristic value multiple selection rule of the second matching condition;
step S414, selecting at least one result data set associated with the self-defined characteristic value according to the characteristic value multiple selection rules;
step S416, integrating the exogenous data package according to the result data set to generate archive data.
In this embodiment, the matching rule of the industry belonging to the first matching condition is set in advance, and then the fixed feature in each external data combination is extracted, so that the industry belonging to the fixed feature can be obtained according to the fixed feature, the external data combination corresponding to the first feature value is filtered out according to the judging result by carrying out matching judgment on the industry belonging to the fixed feature, the first feature value and the matching rule of the industry belonging to the fixed feature, so that the basic file data conforming to the first matching condition is quickly searched out based on the industry belonging to the fixed feature, the screening range is further reduced, and the user can conveniently carry out secondary brushing and selection subsequently; through newly creating a recommendation list, a plurality of exogenous data combinations are sequentially stored in the recommendation list, so that the recommendation list is divided into a plurality of areas, and a plurality of result data sets are generated, thereby being capable of facilitating users to autonomously select required service data according to the recommendation list, avoiding that all service data are built on basic archive data, and the archive data are various in variety and large in quantity, so that the input work is complicated, the progress is slow, and the implementation period is prolonged; the characteristic value multiple selection rule of the second matching condition is set in advance, multiple result data sets associated with the self-defined characteristic value are selected according to the characteristic value multiple selection rule, the selected result data sets are integrated and combined into an exogenous data packet, and the exogenous data packet is imported into the generated file data once, so that a set of file data meeting the industry standard is generated quickly, the implementation period of software is shortened greatly, the labor cost is reduced greatly, and the project implementation progress is accelerated.
Fig. 5 shows another schematic flow chart diagram of a archive data generation method of an embodiment of the present application.
As shown in fig. 5, at least one external data packet is acquired, and classified according to a data classification model, to generate at least one external data combination, which specifically includes:
step S502, obtaining an exogenous document in any exogenous data packet, and segmenting exogenous sentences in the exogenous document according to word segmentation recognition rules in a data classification model;
step S504, fitting the characteristic values of the exogenous sentences according to the characteristic value description rules in the data classification model to generate fixed characteristics and dynamic characteristics corresponding to the exogenous sentences;
step S506, classifying the fixed features containing the same industries according to the industries included in the fixed features and the index arrangement rules in the data classification model, and establishing a unique main index corresponding to the industries for at least one classified fixed feature;
step S508, at least one sub-index is built in each fixed feature, and a unique identifier in the sub-index is added in the dynamic feature, so that each fixed feature and at least one dynamic feature map generate an index tree;
step S510, traversing and confirming that dynamic characteristics exist in sub-indexes in the fixed characteristics, and generating exogenous data combination corresponding to index columns in the index tree.
In this embodiment, a specific rule algorithm is set in the data classification model, so as to implement sorting, processing and classification processing on the external data, where the algorithm design thought specifically includes: performing secondary development on the basis of word segmentation and sequencing, and segmenting each exogenous document to generate a plurality of split exogenous sentences; describing each piece of split data in a mode of using characteristic values, wherein the characteristic values are used for fitting, and the fixed characteristics and the dynamic characteristics corresponding to the exogenous sentence are finely described, wherein the fixed characteristics can comprise: data sources, industries to which they belong, priority; dynamic characteristics: the personalized description is carried out according to the specific scene, specific application and characteristics of various data, so that the description of various data is finer and the application range is wider. The characteristic values are arranged in an index mode by utilizing indexes, and as the attribute values of industries belonging to the fixed characteristics are different, the same attribute values of the industries belonging to the fixed characteristics are classified, and unique main indexes corresponding to the industries one by one are established according to the industries belonging to the fixed characteristics, so that a search engine can conveniently search all the fixed characteristics of the industries belonging to the certain type according to the unique main indexes; setting a plurality of sub-indexes for the fixed features, so that each sub-index is in mapping association with each dynamic feature, and adding the unique identifier into the dynamic feature to form a plurality of index trees because each sub-index contains a unique identifier; by traversing whether the index columns in the index tree have fixed features and dynamic features or not, the generated exogenous data combination can further comprise feature values, so that a user can quickly search exogenous documents which are in line with the user input and correspond to the fixed features according to the combination of the word segmentation feature values and the index tree when inputting the industry; the set priority in the fixed characteristics can identify the using heat of the foreign sentence and display the heat to the user; the data source and the industry of the foreign sentence can be judged through the known word segmentation recognition keywords, so that a set of archive data meeting the industry standard is rapidly generated, and the implementation period of ERP software is greatly shortened.
Fig. 6 shows another schematic flow chart diagram of a archive data generation method of an embodiment of the present application.
As shown in fig. 6, according to a feature value description rule in the data classification model, performing feature value fitting on an exogenous sentence to generate a fixed feature and a dynamic feature corresponding to the exogenous sentence, which specifically includes:
step S602, setting a fixed characteristic value fitting standard according to the use heat of a user;
step S604, traversing and confirming that the external index times in the exogenous sentence are larger than the heat standard value in the fixed characteristic value fitting standard, and setting the priority of the exogenous sentence as high; confirming that the external index times in the exogenous sentence are equal to the heat standard value, and setting the priority of the exogenous sentence as a middle level; confirming that the external index times in the exogenous sentence is smaller than the heat standard value, and setting the priority of the exogenous sentence as low;
step S606, the keywords in the exogenous sentences with the set priorities are identified, the data sources and industries corresponding to the exogenous sentences are judged according to the keywords, and the judging results are marked in the exogenous sentences to generate fixed features corresponding to the exogenous sentences.
In this embodiment, according to a feature value description rule in a data classification model, performing feature value fitting on an exogenous sentence to generate a fixed feature and a dynamic feature corresponding to the exogenous sentence, which specifically includes: the method comprises the steps of setting a fixed characteristic value fitting standard related to the use heat degree of a user in advance, determining the priority of each exogenous sentence in a mode of comparing with the fitting standard by traversing a plurality of exogenous sentences and confirming the external index times in the exogenous sentences, identifying keywords in each exogenous sentence, distinguishing the data source and the belonging industry of each exogenous sentence in a judging mode, marking the judging result in the exogenous sentences, and enabling each exogenous sentence to generate corresponding fixed characteristics, so that the classification of exogenous data is finer.
Fig. 7 shows another schematic flow chart diagram of a archive data generation method of an embodiment of the present application.
As shown in fig. 7, according to a feature value description rule in the data classification model, performing feature value fitting on an exogenous sentence to generate a fixed feature and a dynamic feature corresponding to the exogenous sentence, and further including:
step S702, setting specific scenes and specific uses in the characteristic value description rule according to the application scenes;
step S704, cutting key word segments in the exogenous sentence, simulating a use scene of word segment fields, and generating at least one personalized description corresponding to the exogenous sentence;
step S706, according to the personalized description, matching the specific scene and the specific application which are in accordance with the setting, and adding the specific scene, the specific application and the personalized description in the dynamic buffer zone to generate the dynamic characteristics corresponding to the dynamic buffer zone.
In this embodiment, before acquiring at least one exogenous data packet, the method specifically includes: setting at least one exogenous category according to the category of the archive data; acquiring archive data according to the data collection interface; and classifying and storing the archive data according to the exogenous category so that the archive data collection is stored in the corresponding exogenous category. And acquiring data through various data acquisition interfaces, and storing the acquired data. In addition to these standardized data, the user may also implement collection of other desired data according to the data interface standard provided by the sub-assembly to achieve scalability.
Embodiment two:
according to an embodiment of the second aspect of the present invention, there is provided an archive data generation device.
In this embodiment, the system includes a memory and a processor, where the memory stores a computer program, and the processor is configured to implement the steps of the file data generating method in any of the foregoing embodiments when executing the computer program, where the system is mainly composed of a file data collecting sub-component, a file data classifying sub-component, and a file data recommending sub-component. Each component takes data as a center and processes the data. The desired result is obtained based on the user input.
Fig. 8 shows another schematic flow chart diagram of an archive data generation device of an embodiment of the present application.
As shown in fig. 8, the overall structure of the present invention is as follows:
each component takes data as a center and processes the data. The desired result is obtained based on the user input.
The archive data gathering sub-assembly mainly completes the collection and cleaning of various archive data, and the data are derived from international standards, national standards, industry standards and the like; the archive data classifying sub-component classifies and sorts the collected archive data according to a characteristic classifying method; and the archive data recommending sub-component generates a set of archive data sets which accord with the characteristics of the industry according to the industry input by the user.
Fig. 9 shows another schematic flow chart diagram of an archive data generation device of an embodiment of the present application.
As shown in fig. 9, the data collection structure of the present invention is as follows:
the data of various international standards, national standards and industry standards are imported into the database through the data importing adapter.
Fig. 10 shows another schematic flow chart diagram of an archive data generation device of an embodiment of the present application.
As shown in fig. 10, the data characteristics of the present invention are described as follows:
the securing features include: data sources, industries to which they belong, priority; dynamic characteristics: and carrying out personalized description according to specific scenes, specific purposes and characteristics of various data. The description of various data is finer, and the application range is wider.
Embodiment III:
according to an embodiment of the third aspect of the present invention, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the archive data generation method provided in any of the above embodiments, and therefore, the computer readable storage medium includes all the advantages of the archive data generation method provided in any of the above embodiments, which are not described herein again.
In this embodiment, the steps of the archive data generation method of any one of the above-described aspects are implemented when the computer program is executed by the processor, and therefore the computer-readable storage medium includes all the advantageous effects of the archive data generation method of any one of the above-described aspects.
In particular, a computer-readable storage medium may include any medium that can store or transfer information. Examples of a computer readable storage medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an Erasable ROM (EROM), a floppy disk, a CD-ROM, an optical disk, a hard disk, a fiber optic medium, a Radio Frequency (RF) link, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
In the present application, the term "plurality" means two or more, unless explicitly defined otherwise. The terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; "coupled" may be directly coupled or indirectly coupled through intermediaries. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and variations may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (7)

1. A method for generating archive data, comprising:
acquiring at least one exogenous data packet, classifying the at least one exogenous data packet according to a preset data classification model, and generating at least one exogenous data combination;
filtering at least one exogenous data combination meeting a first matching condition according to a first characteristic value in the archive query instruction, and generating at least one result data set;
selecting the exogenous data packet meeting a second matching condition from at least one result data set according to at least one self-defined characteristic value in the file generation instruction to generate file data;
filtering at least one of the exogenous data combinations meeting a first matching condition according to a first characteristic value in the archive query instruction, and generating at least one result data set, wherein the method specifically comprises the following steps of:
setting an industry matching rule of the first matching condition;
extracting fixed features in each exogenous data combination, and acquiring industries of the fixed features according to the fixed features;
judging that the industries and the first characteristic values conform to the matching rules of the industries, and filtering out the exogenous data combination corresponding to the first characteristic values according to the judging result;
creating a recommendation list, sequentially storing the at least one exogenous data combination in the recommendation list, and generating at least one result data set;
selecting the exogenous data packet meeting a second matching condition from at least one result data set according to at least one custom characteristic value in the archive generation instruction to generate archive data, wherein the archive data specifically comprises:
setting a characteristic value multiple selection rule of the second matching condition;
selecting at least one result data set associated with the self-defined characteristic value according to the characteristic value multiple selection rules;
and integrating the exogenous data packet according to the result data set to generate the archive data.
2. A method for generating profile data according to claim 1, wherein said obtaining at least one external source data packet classifies said at least one external source data packet according to a data classification model, and generating at least one external source data combination comprises:
acquiring an exogenous document in any exogenous data packet, and segmenting exogenous sentences in the exogenous document according to word segmentation recognition rules in the data classification model;
performing characteristic value fitting on the exogenous sentence according to a characteristic value description rule in the data classification model to generate the fixed characteristic and the dynamic characteristic corresponding to the exogenous sentence;
classifying the fixed features containing the same industry according to the industry contained in the fixed features according to index arrangement rules in the data classification model, and establishing a unique main index corresponding to the industry for the classified at least one fixed feature;
establishing at least one sub-index in each fixed feature, and adding a unique identifier in the sub-index to the dynamic feature, so that each fixed feature and at least one dynamic feature map generate an index tree;
traversing and confirming that the sub-indexes in the fixed features exist in the dynamic features, and generating the exogenous data combination corresponding to the index columns in the index tree.
3. A method for generating archival data according to claim 2, wherein the performing feature value fitting on the exogenous sentence according to a feature value description rule in the data classification model to generate the fixed feature and the dynamic feature corresponding to the exogenous sentence specifically includes:
setting a fixed characteristic value fitting standard according to the using heat of a user;
traversing and confirming that the external index times in the exogenous sentence are larger than the heat standard value in the fixed characteristic value fitting standard, and setting the priority of the exogenous sentence as high; confirming that the external index times in the exogenous sentence are equal to the heat standard value, and setting the priority of the exogenous sentence as a middle level; if the external index times in the exogenous sentence are confirmed to be smaller than the heat standard value, setting the priority of the exogenous sentence as low;
and identifying keywords in the exogenous sentences with the set priorities, judging the data sources and the industries corresponding to the exogenous sentences according to the keywords, marking the judging results in the exogenous sentences, and generating the fixed features corresponding to the exogenous sentences.
4. A method for generating profile data according to claim 2, wherein said performing feature value fitting on said external sentence according to feature value description rules in said data classification model, to generate said fixed feature and dynamic feature corresponding to said external sentence, further comprises:
setting a specific scene and a specific use in the characteristic value description rule according to the application scene;
cutting key word segmentation in the exogenous sentence, simulating a use scene of the word segmentation field, and generating at least one personalized description corresponding to the exogenous sentence;
and matching the specific scene and the specific application according to the personalized description, and adding the specific scene, the specific application and the personalized description in a dynamic buffer zone to generate the dynamic characteristics corresponding to the dynamic buffer zone.
5. A method for generating profile data according to claim 1, wherein before said obtaining at least one external source data packet, the method specifically comprises:
setting at least one exogenous category according to the category of the archive data;
acquiring the archive data according to a data collection interface;
and according to the exogenous category, classifying and storing the archive data so that the archive data collection is stored in the corresponding exogenous category.
6. A archive data generation device comprising a memory in which a computer program is stored and a processor for implementing the steps of the archive data generation method of any one of claims 1 to 5 when the computer program is executed.
7. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the archive data generation method of any one of claims 1 to 5.
CN201911314535.5A 2019-12-19 2019-12-19 Archive data generation method, archive data generation device, and readable storage medium Active CN111104476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911314535.5A CN111104476B (en) 2019-12-19 2019-12-19 Archive data generation method, archive data generation device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911314535.5A CN111104476B (en) 2019-12-19 2019-12-19 Archive data generation method, archive data generation device, and readable storage medium

Publications (2)

Publication Number Publication Date
CN111104476A CN111104476A (en) 2020-05-05
CN111104476B true CN111104476B (en) 2023-06-20

Family

ID=70422297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911314535.5A Active CN111104476B (en) 2019-12-19 2019-12-19 Archive data generation method, archive data generation device, and readable storage medium

Country Status (1)

Country Link
CN (1) CN111104476B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364223B (en) * 2020-10-21 2022-08-30 贵州电网有限责任公司 Digital archive system
CN113515610B (en) * 2021-06-21 2022-09-13 中盾创新数字科技(北京)有限公司 File management method based on object-oriented language processing
CN113627535B (en) * 2021-08-12 2024-06-28 福建中信网安信息科技有限公司 Data classification method based on data security and privacy protection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6941325B1 (en) * 1999-02-01 2005-09-06 The Trustees Of Columbia University Multimedia archive description scheme
CN106294009A (en) * 2016-08-05 2017-01-04 北京小米移动软件有限公司 Database filing method and system
CN107704620A (en) * 2017-10-27 2018-02-16 北京锐安科技有限公司 A kind of method, apparatus of file administration, equipment and storage medium
CN108734528A (en) * 2018-05-18 2018-11-02 北京大账房网络科技股份有限公司 A kind of electronic invoice keeps accounts method automatically
CN108899070A (en) * 2018-05-31 2018-11-27 平安医疗科技有限公司 Prescription recommends generation method, device, computer equipment and storage medium
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6941325B1 (en) * 1999-02-01 2005-09-06 The Trustees Of Columbia University Multimedia archive description scheme
CN106294009A (en) * 2016-08-05 2017-01-04 北京小米移动软件有限公司 Database filing method and system
CN107704620A (en) * 2017-10-27 2018-02-16 北京锐安科技有限公司 A kind of method, apparatus of file administration, equipment and storage medium
CN108734528A (en) * 2018-05-18 2018-11-02 北京大账房网络科技股份有限公司 A kind of electronic invoice keeps accounts method automatically
CN108899070A (en) * 2018-05-31 2018-11-27 平安医疗科技有限公司 Prescription recommends generation method, device, computer equipment and storage medium
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
江薇 ; .关于我国档案元数据格式的建议.兰台世界.(第08期),全文. *

Also Published As

Publication number Publication date
CN111104476A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN111104476B (en) Archive data generation method, archive data generation device, and readable storage medium
US7003519B1 (en) Method of thematic classification of documents, themetic classification module, and search engine incorporating such a module
CN105718493B (en) Search result ordering method and its device based on decision tree
CN102156751B (en) Method and device for extracting video fingerprint
US7212667B1 (en) Color image processing method for indexing an image using a lattice structure
CN109726185B (en) Log parsing method, system and computer readable medium based on syntax tree
CN110427884B (en) Method, device, equipment and storage medium for identifying document chapter structure
US7672958B2 (en) Method and system to identify records that relate to a pre-defined context in a data set
US20070299835A1 (en) Search engine for software components and a search program for software components
CN105045927A (en) Automatic coding method and system for data of labor, materials and machines of construction project
CN106557775B (en) Image processing apparatus and image processing method
CN111045612A (en) Printing order parameter matching method, storage medium and computer equipment
CN112364014A (en) Data query method, device, server and storage medium
CN108846398A (en) CAD knows drawing method and device
CN111325562A (en) Grain safety tracing system and method
CN106649678B (en) Data processing method and system
JP2004192555A (en) Information management method, device and program
CN105653540A (en) Method and device for processing file attribute information
CN115828243A (en) Static code flow analysis method based on scanning scheme
JP2004287670A (en) Image database preparing device, image database preparing method, program, and recording medium
CN106372121A (en) Server and data processing method
KR101447526B1 (en) Method and apparatus for sorting personal information database based on an address and for grouping information from the sorted database
JP2020004373A (en) Standard item name setting device, standard item name setting method, and standard item name setting program
CN111488327A (en) Data standard management method and system
CN115168661B (en) Native graph data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant