CN116910163A

CN116910163A - Data classification method, data query method and system thereof

Info

Publication number: CN116910163A
Application number: CN202310869661.7A
Authority: CN
Inventors: 赵培龙
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-10-20

Abstract

The embodiment of the specification discloses a data classification and data query method and a system thereof. To implement data classification, the system may obtain base data and obtain one or more classification rules associated with one or more predefined types, respectively. Wherein the underlying data includes a plurality of entity instances and relationship instances between at least some of the entity instances. Classification rules associated with any predefined type are used to be invoked when a data query task for that predefined type is executed to determine a portion from the underlying data that belongs to that predefined type and to obtain a query result for the data query task based at least on that portion.

Description

Data classification method, data query method and system thereof

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to a data classification method, a data query method, and a system thereof.

Background

Today's society is in the age of information explosion, data has become an asset with high value. The data service platform is intended to provide data services to users, such as data storage, data processing, data querying, data privacy protection, etc. In order to efficiently organize data, data classification is an unavoidable issue for data service platforms. Traditionally, data has been subjected to a custom classification for a business requirement prior to entering the platform, which classification is more abstract or tends to be more pronounced than the underlying type of data that is natural. For example, natural people, companies, payment terminals, which are of basic types, are closer to objective descriptions of data; risk users, illegal transactions, elderly people, electronic product lovers and the like belong to custom classification, and are more abstract or have stronger business intention tendency than basic types.

The traditional method has the advantages that the data is strongly bound with the custom classification, so that the classification basis cannot be traced, and the use of the data is very limited. In view of this, it is desirable to provide a reliable, efficient method of data classification.

Disclosure of Invention

One of the embodiments of the present specification provides a data classification method, which may include: obtaining basic data, wherein the basic data can comprise a plurality of entity instances and at least part of relation instances among the entity instances; one or more classification rules respectively associated with one or more predefined types are obtained. Wherein classification rules associated with any predefined type may be used to be invoked when performing a data query task for that predefined type to determine a portion from the underlying data that belongs to that predefined type and to obtain a query result for the data query task based at least on the portion.

One of the embodiments of the present specification provides a data classification system that may include a first acquisition module and a second acquisition module. The first acquisition module may be configured to acquire base data, where the base data may include a plurality of entity instances and relationship instances between at least some of the entity instances. The second obtaining module may be configured to obtain one or more classification rules respectively associated with one or more predefined types. Wherein classification rules associated with any predefined type may be used to be invoked when performing a data query task for that predefined type to determine a portion from the underlying data that belongs to that predefined type and to obtain a query result for the data query task based at least on the portion.

One of the embodiments of the present specification provides a data sorting apparatus comprising a processor and a storage device for storing instructions. Wherein when the processor executes instructions, the data classification method according to any embodiment of the present specification may be implemented.

One of the embodiments of the present disclosure provides a data query method, which may include: determining a classification rule associated with the target predefined type, the classification rule associated with the target predefined type selected from one or more classification rules respectively associated with one or more predefined types; determining a portion belonging to the target predefined type from base data based on classification rules associated with the target predefined type, the base data comprising a plurality of entity instances and relationship instances between at least some of the entity instances; a data query result is obtained based at least on the portion.

One of the embodiments of the present specification provides a data query system that may include a first determination module, a second determination module, and a query module. The first determination module may be configured to determine a classification rule associated with a target predefined type, the classification rule associated with the target predefined type selected from one or more classification rules respectively associated with one or more predefined types. The second determination module may be configured to determine a portion belonging to the target predefined type from base data based on classification rules associated with the target predefined type, the base data may include a plurality of entity instances and at least a portion of relationship instances between the entity instances. The query module may be configured to obtain data query results based at least on the portion.

One of the embodiments of the present specification provides a data query apparatus, which includes a processor and a storage device for storing instructions. The data query method according to any embodiment of the present disclosure may be implemented when the processor executes instructions.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a schematic illustration of an application scenario of a data service platform according to some embodiments of the present description;

FIG. 2 is an exemplary flow chart of a data classification method according to some embodiments of the present description;

FIG. 3 is an exemplary flow chart of a data query method shown in accordance with some embodiments of the present description;

FIG. 4 is an exemplary block diagram of a data classification system according to some embodiments of the present description;

FIG. 5 is an exemplary block diagram of a data query system according to some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies of different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification, the terms "a," "an," "the," and/or "the" are not intended to be limiting, but rather are to be construed as covering the singular and the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Fig. 1 is a schematic view of an application scenario of a data service platform according to some embodiments of the present disclosure.

As shown in fig. 1, the scenario 100 may include a server 110, one or more clients 120, and a network 130.

The server 110 may provide a data service, which may be a data service platform or an integral part thereof. In some embodiments, the server 110 may provide one or more data services of data storage, data processing, data querying, data privacy protection, and the like. The data service can be oriented to different business fields, such as risk management and control, commodity recommendation and the like. A user may initiate a data service request to the server 110 through the user terminal 120, and the server 110 may return a corresponding data result in response to the request of the user. Taking the wind-controlled data query as an example, the server 110 may receive a data query request from the client 120 for a risk user, and return a query result to the client 120.

In some embodiments, the server 110 may be a standalone server or a group of servers, which may be centralized or distributed. In some embodiments, the server may be regional or remote. In some embodiments, the server may execute on a cloud platform. For example, the cloud platform may include one of a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, or the like, or any combination thereof.

The client 120 may include various types of input/output enabled devices such as a smart phone 120-1, a tablet computer 120-2, a laptop computer 120-3, a desktop computer 120-4, and the like. In some embodiments, the user side 120 may provide a service interface, such as a graphical user interface. A user may request one or more data services from a data store, data processing, data querying, data privacy protection, etc., through the client 120. Taking a data query as an example, a user (e.g., a personal user) may initiate a data query request on the user side 120 and receive a query result returned by the server side 110.

In some embodiments, a user (e.g., an enterprise user) may also import data to the server 110 through the client 120, so that the server 110 performs storage maintenance or processing on the data to provide various data services.

The network 130 connects the components of the scenario 100 such that communication may be made between the components. The network between the parts in the system may comprise a wired network and/or a wireless network. For example, the network 130 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC), an intra-device bus, an intra-device line, a cable connection, and the like, or any combination thereof. The network connection between each two parts can be in one or more of the above-described manners.

It is to be appreciated that data classification is an unavoidable issue for data service platforms in order to efficiently organize data. For example, in a credit scenario, it is necessary to identify users with high overdue risk (simply referred to as "black users") in order to avoid the risk in advance. As another example, in a data analysis scenario, analysis needs to be performed for a particular group (e.g., a car family) in order to reach a conclusion that can provide a reference for business decisions. As another example, in a search scenario, a platform needs to divide information (e.g., people, items, events, transactions, etc.) into different types (e.g., fields, uses, styles, topics, etc.) in order to provide a quick and accurate search service for users. As another example, in a statistical scenario, the relevant departments need to count the proportion of a particular population (e.g., the elderly with children). As another example, in a marketing scenario, users may need to be divided into multiple groupings by product preference, where each grouping corresponds to a product, and the system may promote the product to the user in any of the groupings.

As previously illustrated, for business requirements, people may abstract the data into categories such as risk users, illegal transactions, elderly people, electronic product lovers, etc. This business requirement abstraction based, generalized class is referred to in some embodiments of the present specification as a custom type or a predefined type. Contrary to the predefined types, the basic types described in some embodiments of the present description are determined by the distinguishability of the data itself, e.g., natural people, companies, mobile terminals, etc., which are closer to objective descriptions of the data, independent of business needs. The predefined types may be considered based on the underlying types while being more abstract or having a pronounced propensity for business intent than the underlying types.

In some scenarios, the data has completed a predefined classification for a certain business need before entering the platform. For example, the predefined types include insurance customers and consumer finance customers, and when the user imports customer data into the data service platform, the user divides the customer data into data under the insurance customer category and data under the consumer finance customer category. This strongly binds the data to the predefined class, so that the following two problems occur: firstly, the data is provided with a predefined type when entering the platform, however, the platform does not know the standard or rule based on which the user introducing the data classifies the data, namely the classification basis cannot be traced, and the reliability cannot be guaranteed; secondly, as described above, the data may have a base type in addition to a predefined type, and may be more general when the data is divided only by the base type, but is difficult to apply to other business scenarios when divided by the predefined type. For example, customers who have purchased insurance may be categorized as insurance customer types, but in practice these customers may be entirely consumers of other merchandise, i.e., data of the insurance customer type may be used for other merchandise recommendations in addition to insurance recommendations, and if customer data is initially entered into the data service platform as an insurance customer type, it is difficult to multiplex into the business scenario of other merchandise recommendations.

In view of this, the embodiments of the present disclosure provide a dynamic data classification method based on separation of basic data from classification rules. The data entering the platform is no longer provided with the predefined type, the basic data is classified based on the classification rule associated with the predefined type when in use (such as data query), and the processing result is obtained based on the classified data. The data classification method provided by some embodiments of the present disclosure enables data to be suitable for a plurality of different service requirements, effectively improves the data multiplexing rate, and simultaneously, performs expression and recording according to classification rules, thereby having good traceability.

Fig. 2 is an exemplary flow chart of a data classification method according to some embodiments of the present description. In some embodiments, the process 200 may be performed by one or more processors (e.g., one or more processors of the server 110), and in particular, may be implemented by the system 400 implemented on the server 110 shown in fig. 4. As shown in fig. 2, the process 200 may include the following steps.

Step 210, obtaining basic data. In some embodiments, step 210 may be performed by the first acquisition module 410.

The basic data is data which is not processed by knowledge or is pre-defined and classified, and can be raw data directly generated in the business field, such as transaction data, wherein one transaction data can comprise transaction time, transaction amount, transaction party account and the like, and another transaction data can be client data, and one client data can reserve the name, age, sex, occupation and the like of a client. In some embodiments, the underlying data may include a plurality of entity instances and relationship instances between at least some of the entity instances.

The entities may be distinguishable, for example, the entities may include users, merchants, accounts, cities, medications, companies, devices, and the like. Entity instance refers to a specific entity, for example, "Zhang Sano" may be used as one entity instance under a user entity, and "XX Bank XX branch" may be used as one entity instance under a corporate entity. The terms "entity" and "instance of entity" may be used interchangeably without causing ambiguity.

The entities may have relationships between them, such as friends, employment, parent, etc. Relationship instances refer to specific relationships between entity instances. By way of example only, the following are 5 relationship examples: (1) a friend relationship between Zhang III and Lifour; (2) a certain login behavior of the social account X at the terminal Y; (3) a transfer between account a and account B; (4) a message sent by user C to user D; (5) get to a certain flight all the way to the sea. The terms "relationship" and "relationship instance" may be used interchangeably without causing ambiguity.

In some embodiments, the underlying data may be represented as a knowledge-graph. The knowledge graph may include a plurality of node instances and edge instances between at least some of the node instances. At this time, the node instances and the relationship instances in the knowledge graph may be classified based on the underlying types of data, such as user type nodes, company type nodes, and employment relationships included in the knowledge graph. Node instances of different base types correspond to entity instances under respective base types, and edge instances of different base types correspond to relationship instances under respective base types.

In some embodiments, the underlying data may also be represented in other types of graphs or other forms of data (e.g., tabular forms).

It should be noted that, although the description is mainly made with reference to a knowledge graph as an example, the data classification principle in the description is applicable to data in any expression form.

In some embodiments, the base data may be imported by a user. For example, the user may click a "data import" button on the operation interface of the user terminal 120 to conduct the basic data import.

Step 220, obtaining one or more classification rules respectively associated with the one or more predefined types. In some embodiments, step 220 may be performed by the second acquisition module 420.

In some embodiments, the one or more predefined types may include one or more predefined entity types and one or more predefined relationship types. Taking a knowledge graph as an example, the one or more predefined types may include one or more predefined node types and one or more predefined edge types.

As previously described, the predefined types may be derived based on the underlying type of data. For example, the predefined type may be a subtype under the base type. As an example, the base type may include a natural person under which the user may define a child-geriatric type. As another example, the base types may include enterprise types under which a user may define entity types of small micro-enterprises, medium-sized enterprises, large-sized enterprises, and the like. As another example, the base types may include transaction types under which the user may define relationship types of small, large, abnormal, etc. It should be appreciated that the base type-level classification results are already contained in the base data, whereas the predefined type-level classification is the data classification problem discussed herein.

Classification rules associated with a predefined type may be considered as definitional descriptions of the predefined type by which it may be determined whether an entity instance or relationship instance belongs to the predefined type.

In some embodiments, the classification rules associated with the predefined entity type may include condition description information for use in determining whether the entity instance belongs to the predefined entity type. That is, the classification rule associated with a predefined entity type may indicate a condition that an instance of an entity belonging to the predefined entity type needs to satisfy. By way of example only, assuming that the user defines a child-senior type, the classification rules associated with the child-senior type may indicate that an instance of an entity belonging to the child-senior type (referred to as a child-senior) needs to simultaneously satisfy the following conditions: 1) The children; 2) Age above 60 years old; 3) Children were less than 18 years old.

In some embodiments, the classification rules associated with the predefined relationship type may include condition description information for use in determining whether the relationship instance belongs to the predefined relationship type. That is, the classification rule associated with a predefined relationship type may indicate a condition that a relationship instance belonging to the predefined relationship type needs to satisfy. Assuming that the user defines a relationship type "abnormal transaction," the classification rules associated with the abnormal transaction type may indicate that the relationship instance belonging to the abnormal transaction type needs to satisfy any of the following conditions: 1) Transaction amount exceeds a set value (e.g., exceeds 500 ten thousand yuan); 2) The transaction account has a foreign login behavior (e.g., the transaction account was logged in first place the last week, but was logged in second place at the time of the transaction).

In some embodiments, the classification rules may include condition description information and action functions. The condition description information can be used for referencing related basic types and specifying screening conditions, and the action function can be used for screening entity instances and/or relationship instances belonging to the related basic types according to the condition description information and returning entity instances or relationship instances meeting the conditions. It will be appreciated that for classification rules associated with a predefined entity type, the entity instance returned by the action function is the entity instance belonging to that predefined entity type; for classification rules associated with a predefined relationship type, the relationship instance returned by the action function is the relationship instance belonging to the predefined relationship type.

In some embodiments, the classification rules may appear as executable files or descriptive scripts.

When the underlying data is represented as a knowledge graph, the condition description information may include a target graph pattern (or referred to as a target graph structure). Continuing with the foregoing child-bearing elderly example, the condition description information may include a first node of a base type of elderly (meaning aged over 60 years), a second node of a base type of minor (meaning aged less than 18 years), and a target pattern of edges connected between the two nodes and of a base type of parent-child. Accordingly, when the classification rule is executed, the action function may search the basic data (knowledge graph) for a sub-graph conforming to the pattern of the target graph, and return the searched first node instance (the entity instance of which the basic type is the elderly), where the first node instance returned by the action function belongs to the entity instance of which the child and the elderly are types.

In some embodiments, classification rules may be imported by the user. Because the classification rules are separated from the basic data, a user can conveniently formulate or update the classification rules. For example, for the same predefined type, the user may update the classification rule associated with the predefined type, i.e., import the updated classification rule associated with the predefined type.

It can be seen that the process 200 can be used to store basic data and classification rules, respectively. In performing a data query task for a predefined type, classification rules associated with the predefined type may be invoked to determine a portion from the underlying data that belongs to the predefined type, and to obtain a query result for the data query task based at least on the portion. That is, after obtaining the underlying data and classification rules associated with any predefined type, the data service platform is already provided with the ability to perform data query tasks for that predefined type.

FIG. 3 is an exemplary flow chart of a data query method according to some embodiments of the present description. The flow 300 may be performed by one or more processors (e.g., one or more processors of the server 110), and in particular, may be implemented by the system 500 implemented on the server 110 shown in fig. 5. In some embodiments, the process 300 may be triggered by a data query request from the user terminal 120. As shown in fig. 3, the process 300 may include the following steps.

At step 310, classification rules associated with the target predefined type are determined. In some embodiments, step 310 may be performed by the first determination module 510.

In some embodiments, classification rules may be named with predefined types, such as named "child elderly", through which their associated classification rules may be invoked. The data query requests from the user side may specify a predefined type that needs to be queried, or referred to as a target predefined type. The first determination module 510 may further determine a classification rule associated with the target predefined type. Wherein the classification rule associated with the target predefined type is selected from one or more classification rules respectively associated with one or more predefined types. A detailed description of the classification rules can be found in the relevant description of step 220.

At step 320, a portion belonging to the target predefined type is determined from the base data based on the classification rule associated with the target predefined type. In some embodiments, step 320 may be performed by the second determination module 520.

A detailed description of the underlying data may be found in the relevant description of step 220.

In some embodiments, the second determination module 520 may determine all instances (all entity instances or all relationship instances) belonging to the target predefined type from the underlying data based on classification rules associated with the target predefined type. It should be noted that, when the query related to the target predefined type requires more frequent or the instance data amount is larger, after the server 110 obtains the classification rule associated with the target predefined type, all instances belonging to the target predefined type may be determined from the basic data according to the classification rule associated with the target predefined type and saved for subsequent queries.

In some embodiments, when the classification rule appears as an executable file, the second determination module 520 may execute the executable file associated with the target predefined type to determine the portion belonging to the target predefined type from the base data.

For more implementation details of step 320, reference may be made to the description of classification rules previously described.

Step 330, obtaining a data query result based at least on the portion. In some embodiments, step 330 may be performed by query module 530.

In some embodiments, the data query request only requires the return of instance data under the target predefined type, and the data query result may include all instances (all entity instances or all relationship instances) in the underlying data that belong to the target predefined type. In some embodiments, when all the instances belonging to the target predefined type in the base data are saved in advance, the server 110 may determine from the base data whether the specified instance (the specified entity instance or the specified relationship instance) belongs to the target predefined type, and further obtain the query result. In some embodiments, the data query request further includes further operations on the instance data under the target definition, such as counting numbers, averaging or maximizing some attribute value in the instance data, and so forth. The data query result may be the result of an operation on instance data under the target predefined.

Of course, the query result is not limited to the above-mentioned cases as long as the acquisition of the query result depends on the classification of data based on the classification rule. For example, a user may input a plurality of query conditions to query the underlying data for portions of the query conditions that are simultaneously satisfied, one of the plurality of query conditions being that the related instance is of a predefined type. Specifically, the user may query the abnormal transaction with the transaction time of 2022, 01 and 01 (the abnormal transaction is of a predefined type), and accordingly, the server 110 may further verify whether the transaction time of each abnormal transaction is 2022, 01 and 01 after finding all abnormal transactions in the basic data, so as to obtain the query result. Alternatively, the server 110 may first find out the transaction with the transaction time of 2022, 01 and 01 in the basic data, and then determine the abnormal transaction from the transactions with the transaction time of 2022, 01 and 01 based on the classification rule associated with the abnormal transaction type, so as to obtain the query result. As another example, the query result may include sub-graph data of an instance (entity instance or relationship instance) belonging to the target predefined type, that is, after determining an instance (abbreviated as a target instance) belonging to the target predefined type from the base data, the server 110 may further determine sub-graph data of the target instance from the base data to obtain the query result.

In some embodiments, to implement the process 300, the server 110 may generate a corresponding query script based on the user's data query request, where the query script may invoke the classification rule associated with the target predefined type by the predefined type name, and while the query script is executed, the invoked classification rule may also be executed to obtain the instance data under the target predefined type from the underlying data of the platform. In some embodiments, the query script may further include an operation instruction to further operate on the obtained instance data under the target predefined type to obtain a query result required by the user. When the classification rule is executed, the instance data under the target predefined type can be repeatedly copied, and the obtained query result cannot be written into the basic data, so that the independence of the basic data is ensured, and the multiplexing rate of the basic data is improved.

It should be noted that the above description of the flow is only for the purpose of illustration and description, and does not limit the application scope of the present specification. Various modifications and changes to the flow may be made by those skilled in the art under the guidance of this specification. However, such modifications and variations are still within the scope of the present description.

FIG. 4 is an exemplary block diagram of a data classification system according to some embodiments of the present description. In some embodiments, system 400 may be implemented on server 110 shown in FIG. 1.

As shown in fig. 4, the system 400 may include a first acquisition module 410 and a second acquisition module 420.

The first acquisition module 410 may be used to acquire the underlying data. In some embodiments, the underlying data may include a plurality of entity instances and relationship instances between at least some of the entity instances.

The second obtaining module 420 may be configured to obtain one or more classification rules respectively associated with one or more predefined types.

In some embodiments, the underlying data and/or classification rules may be imported by the user.

For more details on system 400 and its modules, reference may be made to FIG. 2 and its associated description.

FIG. 5 is an exemplary block diagram of a data query system according to some embodiments of the present description. In some embodiments, system 500 may be implemented on server 110 shown in FIG. 1.

As shown in fig. 5, the system 500 may include a first determination module 510, a second determination module 520, and a query module 530.

The first determination module 510 may be used to determine classification rules associated with a target predefined type.

The second determination module 520 may be configured to determine a portion belonging to the target predefined type from the underlying data based on classification rules associated with the target predefined type. In some embodiments, the underlying data may include a plurality of entity instances and relationship instances between at least some of the entity instances.

The query module 530 may be configured to obtain data query results based at least on the portion.

For more details on system 500 and its modules, reference may be made to FIG. 3 and its associated description.

It should be understood that the systems and modules thereof shown in fig. 4, 5 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may then be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system of the present specification and its modules may be implemented not only with hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also with software executed by various types of processors, for example, and with a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the system and its modules is for convenience of description only and is not intended to limit the present description to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the principles of the system, various modules may be combined arbitrarily or a subsystem may be constructed in connection with other modules without departing from such principles. For example, in some embodiments, the first determining module 510 and the second determining module 520 may be two modules or may be combined into one module. As another example, in some embodiments, the data classification system 400 and the data query system 500 may be two systems or may be combined into one system. Such variations are within the scope of the present description.

Possible benefits of embodiments of the present description include, but are not limited to: (1) The dynamic data classification method based on the rules is provided, the basic data can be suitable for various different service demands, the data multiplexing rate is effectively improved, meanwhile, the classification basis is expressed and recorded according to the classification rules, and the method has good traceability; (2) The classification results or query results generated during business application are stored separately from the underlying data (e.g., classification or query is performed without changing the knowledge graph), facilitating updating the classification results after the classification criteria are changed. It should be noted that, the advantages that may be generated by different embodiments may be different, and in different embodiments, the advantages that may be generated may be any one or a combination of several of the above, or any other possible advantages that may be obtained.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting of the embodiments of the present disclosure. Although not explicitly described herein, various modifications, improvements, and adaptations to the embodiments of the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are suggested in the present description examples, and therefore, are intended to fall within the spirit and scope of the example embodiments of this description.

Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.

Furthermore, those skilled in the art will appreciate that aspects of the embodiments of the specification can be illustrated and described in terms of several patentable categories or conditions, including any novel and useful processes, machines, products, or compositions of matter, or any novel and useful improvements thereof. Accordingly, aspects of the embodiments of this specification may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of embodiments of the present description may take the form of a computer product, including computer-readable program code, embodied in one or more computer-readable media.

The computer storage medium may contain a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take on a variety of forms, including electro-magnetic, optical, etc., or any suitable combination thereof. A computer storage medium may be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or a combination of any of the foregoing.

Computer program code necessary for operation of portions of embodiments of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, vb net, python and the like, a conventional programming language such as C language, visualBasic, fortran2003, perl, COBOL2002, PHP, ABAP, dynamic programming languages such as Python, ruby and Groovy, or other programming languages and the like. The program code may execute entirely on the user's computer or as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the use of services such as software as a service (SaaS) in a cloud computing environment.

Furthermore, the order in which the elements and sequences are presented in the examples, the use of numerical letters, or other designations are used, unless specifically indicated in the claims, is not intended to limit the order in which the steps of the examples and methods are presented. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing processing device or mobile device.

Similarly, it should be noted that in order to simplify the description of embodiments disclosed herein and thereby facilitate an understanding of one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are required by the embodiments of the present disclosure. Indeed, less than all of the features of a single embodiment disclosed above.

Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., referred to in this specification is incorporated herein by reference in its entirety. Except for application history files that are inconsistent or conflicting with the disclosure of this specification, files that are limiting to the broadest scope of the claims of the present application (currently or later in the application) are also excluded. It is noted that, if the description, definition and/or use of a term in an attached material in this specification does not conform to or conflict with what is described in this specification, the description, definition and/or use of the term in this specification controls.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are also possible within the scope of the embodiments of the present description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. A method of data classification, comprising:

obtaining basic data, wherein the basic data comprises a plurality of entity instances and at least part of relation instances among the entity instances;

obtaining one or more classification rules respectively associated with one or more predefined types;

wherein the classification rule associated with any predefined type is used to be invoked when executing a data query task for that predefined type to determine a portion belonging to that predefined type from the underlying data, and to obtain a query result for the data query task based at least on the portion.

2. The method of claim 1, wherein the one or more predefined types include one or more predefined entity types, and the classification rule associated with a predefined entity type includes condition description information for use in determining whether an entity instance belongs to the predefined entity type.

3. The method of claim 1, wherein the entity instance and/or the relationship instance has a base type, the predefined type being derived based on the base type.

4. A method as claimed in claim 3, wherein the classification rules include condition description information and action functions; the condition description information is used for referring to the relevant basic types and designating screening conditions, and the action function is used for screening entity instances and/or relationship instances belonging to the relevant basic types according to the condition description information and returning entity instances or relationship instances meeting the conditions.

5. The method of claim 1, wherein the base data and/or the classification rules are imported by a user.

6. The method of claim 1, wherein the underlying data is represented as a knowledge-graph.

7. A data classification system comprises a first acquisition module and a second acquisition module;

the first acquisition module is used for acquiring basic data, and the basic data comprises a plurality of entity instances and at least part of relation instances among the entity instances;

the second obtaining module is used for obtaining one or more classification rules respectively associated with one or more predefined types;

8. The system of claim 7, wherein the base data and/or the classification rules are imported by a user.

9. A data sorting apparatus comprising a processor and a storage device for storing instructions, wherein the data sorting method according to any one of claims 1 to 6 is implemented when the processor executes the instructions.

10. A data query method, comprising:

determining a classification rule associated with the target predefined type; the classification rule associated with the target predefined type is selected from one or more classification rules respectively associated with one or more predefined types;

determining a portion belonging to the target predefined type from the base data based on a classification rule associated with the target predefined type; the basic data comprises a plurality of entity instances and at least part of relation instances among the entity instances;

a data query result is obtained based at least on the portion.