WO2022260669A1

WO2022260669A1 - Systems and methods for anomaly detection in a relational database

Info

Publication number: WO2022260669A1
Application number: PCT/US2021/036752
Authority: WO
Inventors: Siming Li; Wei Dong
Original assignee: Industrial Artificial Intelligence Inc.
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2022-12-15

Abstract

A computing system performs the method for anomaly detection in a relational database. The method includes: extracting table structures from the relational database, and the table structures include multiple entities and each entity having multiple table columns; converting each table in the table structures to a machine learning model; training the machine learning models by feeding the corresponding table columns into the machine learning models; and identifying a value of the table columns as an anomaly when a difference between the value and the predicted value by the machine learning models exceeds a predefined threshold.

Description

SYSTEMS AND METHODS FOR ANOMALY DETECTION IN A RELATIONAL DATABASE

TECHNICAL FIELD

[0001] This application relates generally to data processing technology including, but not limited to, methods, systems, and non-transitory computer-readable media for detecting anomaly from a relational database.

BACKGROUND

[0002] Relational databases are widely used nowadays. Massive amount of data is stored in various kinds of relational databases. A relational database conforms to a database schema. Relational databases usually hold raw business data and sometimes, the data is incomplete or noisy. While the traditional data mining focuses more on the extraction of overall information, the data mining does not handle the incomplete or noisy data.

[0003] In a relational database, anomalies are values in the database tables that deviate from the regular values. Anomalies can be caused by different reasons. For example, there are errors in the data input or gathering process. In some other examples, abrupt changes in business could cause the deviation in the data.

[0004] It would be beneficial to develop systems and methods to detect anomalies automatically for subsequent data cleaning and correction.

SUMMARY

[0005] The present application describes implementations related to data processing and, more particularly, to system and method of detecting anomalies from a relational database.

[0006] Systems and methods for anomaly detection in a relational database are disclosed herein. Anomalies in a database can be automatically detected using machine learning models. User-defined actions are triggered upon detection of an anomaly. In some embodiments, a user can annotate the detected anomalies for model improvement in the system.

[0007] In one aspect, a method is implemented at a computer system for anomaly detection in a relational database. The method includes: extracting table structures from the relational database, and the table structures include multiple entities and each entity having multiple table columns; converting each table in the table structures to a machine learning model; training the machine learning models by feeding the corresponding table columns into the machine learning models; and identifying a value of the table columns as an anomaly when a difference between the value and the predicted value by the machine learning models exceeds a predefined threshold.

[0008] In some embodiments, extracting table structures includes: identifying tables to be converted; and converting table column values of the identified tables into a feature value format.

[0009] In some embodiments, converting each table in the table structures to the machine learning model includes: creating a respective subnet of neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as the input table columns to the neural network structures; identifying a second set of table columns as the output table columns to the neural network structures; creating a respective neural network structure for each table, and converting the primary keys and foreign keys into a feature value format in an embedding layer.

[0010] In some embodiments, the first set of table columns consists of all table columns, and the second set of table columns includes table columns not identified by the primary keys or the foreign keys.

[0011] In some embodiments, training the machine learning models includes: selecting a respective neural network for each table; loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for the each table; generating a prediction for each of table column values of respective outputs; and reducing a difference between the respective outputs of the respective neural network and the respective fitting outputs.

[0012] In some embodiments, identifying the value in the one of the table columns as the anomaly further includes: generating a loss function for the each of the table column values of the respective outputs by comparing the difference between an actual column value and the prediction for the each of the table column values of the respective outputs.

[0013] In some embodiments, identifying the value in the one of the table columns as the anomaly further includes: in accordance with a determination that the loss function corresponding to the value is greater than a predetermined threshold, determining the table column value of the respective outputs as the anomaly.

[0014] In some embodiments, the anomaly detection process further includes: triggering a user-defined action upon the identification of the anomaly.

[0015] In another aspect, some implementations include a computer system including one or more processors and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform any of the above methods. [0016] In yet another aspect, some implementations include a non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] The accompanying drawings, which are included to provide a further understanding of the implementations and are incorporated herein and constitute a part of the specification, illustrate the described implementations and together with the description serve to explain the underlying principles. Like reference numerals refer to corresponding parts. [0018] Figure 1 is an exemplary data processing environment in accordance with some embodiments.

[0019] Figure 2 is an exemplary neural network for processing data in a machine learning model, in accordance with some implementations of the present disclosure.

[0020] Figure 3 is a diagram illustrating a relational database schema structure according to some implementations of the present disclosure.

[0021] Figure 4 is a flow diagram illustrating an entity feature extracting process according to some implementations of the present disclosure.

[0022] Figure 5 is a block diagram illustrating the neural network structures converted or complied from the database structures according to some implementation of the present disclosure.

[0023] Figure 6 is a flowchart illustrating an exemplary process by which a computing system implements the techniques of extracting an entity feature from a relational database in accordance with some implementations of the present disclosure.

[0024] Figure 7 is a block diagram of a system architecture of the data anomaly detection system according to some implementations of the present disclosure. [0025] Figure 8 is a block diagram illustrating simplified neural networks converted from a Sales Forecast database for anomaly detection in accordance with some implementations of the present disclosure.

[0026] Figure 9 is a flowchart illustrating an exemplary process by which a computing system implements the techniques of anomaly detection in a relational database in accordance with some implementations of the present disclosure.

DETAILED DESCRIPTION

[0027] Reference will now be made in detail to specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to assist in understanding the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that various alternatives may be used without departing from the scope of claims and the subject matter may be practiced without these specific details. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein can be implemented on many types of electronic devices.

[0028] Figure 1 is an exemplary data processing environment 100 in which one or more network-connected client devices 102 and one or more server systems 104 interact with each other locally or remotely via one or more communication networks 106 in accordance with some embodiments.

[0029] In some embodiments, the server systems 104, such as 104A and 104B are physically remote from, but are communicatively coupled to the one or more client devices

102. In some embodiments, a client device 102 (e.g., 102A, 102B) includes a desktop computer. In some embodiments, a client device 102 (e.g., 102C) includes a mobile device, e.g., a mobile phone, a tablet computer and a laptop computer. Each client device 102 can collect data or user inputs, execute user applications, and present outputs on its user interface.

The collected data or user inputs can be processed locally at the client device 102 and/or remotely by the server(s) 104. Each client device 102 communicates with another client device 102 or the server systems 104 using the one or more communication networks 106.

The communication networks 106 can be one or more networks having one or more types of topologies, including but not limited to the Internet, intranets, local area networks (LANs), cellular networks, Ethernet, telephone networks, Bluetooth personal area networks (PAN), and the like. In some embodiments, two or more client devices 102 in a sub-network are coupled via a wired connection, while at least some client devices 102 in the same sub- network are coupled via a local radio communication network (e.g., ZigBee, Z-Wave,

Insteon, Bluetooth, Wi-Fi and other radio communication networks). In an example, a client device 102 establishes a connection to the one or more communication networks 106 either directly (e.g., using 3G/4G connectivity to a wireless carrier), or through a network interface (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof.

[0030] Each of the server systems 104 includes one or more processors 110 and memory storing instructions for execution by the one or more processors 110. The server system 104 also includes an input/output interface to the client(s) as 114. The one or more server systems 104 provide system data (e.g., boot files, operating system images, and user applications) to the client devices 102, and in some embodiments, process the data and user inputs received from the client device(s) 102 when the user applications are executed on the client devices 102. The one or more server systems 104 can enable real-time data communication with the client devices 102 that are remote from each other or from the one or more server systems 104. The server system 104A is configured to store an information database such as a relational database 112. The server system 104B is configured to store a machine learning model 116. In some embodiments, the machine learning model and the relational database can be in the same server 104. In some embodiments, the machine learning model 116 is one or more neural networks. A machine model training method can be implemented at one or more of the server systems 104.

[0031] Each client device 102 includes one or more processors and memory storing instructions for execution by the one or more processors. The instructions stored on the client device 102 enable implementation of the web browser and user interface application to the relational database 112 or to the machine learning model 116. The web browser and the user interface application are linked to a user account in the data processing environment 100. Specifically, the browser or user interface application enables a GUI for a user 120 to review information from the relational database 112 or from the machine learning model 116. For example, a user 120 can add notations or make changes to the data or selections on the relational database 112 or on the machine learning model 116. The new user data can be stored on the client device 102 or on the server systems 104.

[0032] Machine learning techniques are applied in the data processing environment

100 to process data obtained by an application executed at a client device 102 or loaded from another database or files to identify information contained in the data, match the data with other data, categorize the data, or synthesize related data. Data can include text, images, audios, videos, etc. In these machine learning techniques, machine learning models are created based on one or more neural networks to process the data. These machine learning models are trained with training data before they are applied to process the data. In some embodiments, a machine learning model training method is implemented at a client device 102. In some embodiment, a machine learning model training method is jointly implemented at the client device 102 and the server system 104. In some embodiments, a machine learning model can be held at a client device 102. In some embodiments, the client device 102 is configured to automatically and without user intervention, identify, classify or modify the data information from the relational database 112 or from the machine learning model 116. [0033] In some embodiments, both model training and data processing are implemented locally at each individual client device 102 (e.g., the client device 102C). The client device 102C obtains the training data from the one or more server systems 104 including the database 112 and applies the training data to train the machine learning models. Subsequently to model training, the client device 104C obtains the and processes the data using the trained machine learning models locally. Alternatively, in some embodiments, both model training and data processing are implemented remotely at a server system 104 (e.g., the server system 104B) associated with a client device 102 (e.g. the client device 102A). The server 104B obtains the training data from itself, another server 104 or the database 112 and applies the training data to train the machine learning models 116. The client device 102A obtains the data, sends the data to the server 104B (e.g., in an application) for data processing using the trained machine learning models, receives data processing results from the server 104B, and presents the results on a user interface (e.g., associated with the application). The client device 102A itself implements no or little data processing on the data prior to sending them to the server 104B. Additionally, in some embodiments, data processing is implemented locally at a client device 102 (e.g., the client device 102B), while model training is implemented remotely at a server system 104 (e.g., the server 104B) associated with the client device 102B. The trained machine learning models are optionally stored in the server 104B or another database, such as 112. The client device 102B imports the trained machine learning models from the server 104B or database 112, processes the data using the machine learning models, and generates data processing results to be presented on a user interface locally. [0034] The machine learning model system 116 includes one or more of a server, a client device, a storage, or a combination thereof. The machine learning model system 116, typically, includes one or more processing units (CPUs), one or more network interfaces, memory, and one or more communication buses for interconnecting these components (sometimes called a chipset). The machine learning model system 116 includes one or more input devices that facilitate user input, such as a keyboard, a mouse, a voice-command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing camera, or other input buttons or controls. The machine learning model system 116 also includes one or more output devices that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays.

[0035] Memory includes high-speed random access memory, such as DRAM, SRAM,

DDR RAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory, optionally, includes one or more storage devices remotely located from one or more processing units. Memory, or alternatively the non volatile memory within memory, includes a non-transitory computer readable storage medium. In some embodiments, memory, or the non-transitory computer readable storage medium of memory, stores programs, modules, and data structures including operating system, input processing module for detecting and processing input data, model training module for receiving training data and establishing a machine learning model for processing data, machine learning module for processing data using machine learning models, etc.

[0036] Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, modules or data structures, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory, optionally, stores a subset of the modules and data structures identified above. Furthermore, memory, optionally, stores additional modules and data structures not described above.

[0037] Figure 2 is an exemplary neural network 200 implemented to process data in a machine learning model 116, in accordance with some implementations of the present disclosure. The machine learning model 116 is established based on the neural network 200 A corresponding model-based processing module within the server system 104B applies the machine learning model 116 including the neural network 200 to process data that has been converted to a predefined format.

[0038] The neural network 200 includes a collection of neuron nodes 220 that are connected by links 212. Each neuron node 220 receives one or more neuron node inputs and applies a propagation function to generate a neuron node output from the one or more neuron node inputs. As the neuron node output is transmitted through one or more links 212 to one or more other neuron nodes 220, a weight associated with each link 212 is applied to the neuron node output. The one or more neuron node inputs are combined based on corresponding weights according to the propagation function. In an example, the propagation function is a product of a non-linear activation function and a linear weighted combination of the one or more neuron node inputs.

[0039] The neural network consists of one or more layers with the neuron nodes 220.

In some embodiments, the one or more layers include a single layer acting as both an input layer and an output layer. In some embodiments, the one or more layers include an input layer 202 for receiving inputs, an output layer 206 for generating outputs, and zero or more hidden layers/latent layers 204 (e.g., 204A and 204B) between the input and output layers 202 and 206. A deep neural network has more than one hidden layers 204 between the input and output layers 202 and 206. In the neural network 200, each layer is only connected with its immediately preceding and/or immediately following layer. In some embodiments, a layer 202 or 204B is a fully connected layer because each neuron node 220 in the layer 202 or 204B is connected to every neuron node 220 in its immediately following layer.

[0040] In some embodiments, one or more neural networks can be utilized by the machine learning model 116. The one or more neural networks include a fully connected neural network, Multi-layer Perceptron, Convolution Neural Network, Recurrent Neural Networks, Feed Forward Neural Network, Radial Basis Functional Neural Network, LSTM - Long Short-Term Memory, Auto encoders, and Sequence to Sequence Models, etc.

[0041] In some embodiments, a convolutional neural network (CNN) is utilized in a machine learning model 116 to process data. The CNN employs convolution operations and belongs to a class of deep neural networks 200, i.e., a feedforward neural network that only moves data forward from the input layer 202 through the hidden layers to the output layer 206. The one or more hidden layers of the CNN are convolutional layers convolving with a multiplication or dot product. Each neuron node in a convolutional layer receives inputs from a receptive area associated with a previous layer (e.g., six neuron nodes), and the receptive area is smaller than the entire previous layer and may vary based on a location of the convolution layer in the convolutional neural network.

[0042] The training process is a process for calibrating all of the weights for each layer of the learning model using a training data set which is provided in the input layer 202. The training process typically includes two steps, forward propagation and backward propagation, which are repeated multiple times until a predefined convergence condition is satisfied. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, a margin of error of the output (e.g., a loss function) is measured, and the weights are adjusted accordingly to decrease the error. The activation function is optionally linear, rectified linear unit, sigmoid, hyperbolic tangent, or of other types. In some embodiments, a network bias term is added to the sum of the weighted outputs from the previous layer before the activation function is applied. The network bias provides a perturbation that helps the neural network 200 avoid over fitting the training data. The result of the training includes the network bias parameter for each layer.

[0043] In some embodiments, relational databases are modeled after entity -relational models. An entity-relationship model (or ER model) describes entities and relationships between the entities. An ER model is typically implemented in a database. An entity is defined as a physical or a logical thing that can be uniquely defined. For example, an entity may be a physical object such as a product or an event such as a product sale. A relation describes how entities are related or interacted with one another. Entities can be characterized not only by relationships but by attributes (or properties). The attributes include identifiers (IDs) such as primary keys in the database. For example, a product may have attributes including product id, category 1, category 2, price, etc. In some embodiments, an entity is represented by a database table. Figure 3 is a diagram illustrating a relational database schema structure according to some implementations of the present disclosure. In Figure 3, database items within rectangular boxes are entities, such as Store, Product and Region. An entity such as Product has many attributes within oval shaped boxes including Product Name, Product Color, and Product Style. The relations between the entities are represented by Rhombus shaped boxes, such as “Has” and “Contains”. [0044] In some embodiments, the entity features or attributes from the relational database are extracted as an alternative representation of the entities for some subsequent processes, including machine learning, clustering, visualization, and, anomaly detection, etc. [0045] In some embodiments, a relational database is converted to a machine learning model mapping. In some embodiments, a machine learning model has a neural network structure. A relational database conforms to a database schema, and the relational database has a number of tables. Each table has a number of columns and a number of rows. Each column has a name and a type, and the type can include: Primary Key ID, Foreign Key IDs (reference to another table), Date/Datetime, Category (including categorical integers, strings, Boolean and others), Numeric, or Free text, etc.

[0046] In some embodiments, a machine learning model, such as 116, has inputs represented by matrix X and outputs represented by matrix Y. Both of the input X and output

Y matrices have the same number of rows. Each of the matrix columns is called a feature.

The X and Y matrices have corresponding rows, and each of the rows is an example. The input X can have missing values.

[0047] In some embodiments, matrix Y’ is used as ground truth for the output matrix

Y at a training stage for the machine learning model, and the matrix Y’ has the same shape, i.e. same rows and columns, as the output matrix Y. In some embodiments, a loss function is used to compare the prediction output matrix Y and the ground truth Y’. A smaller loss function usually indicates a better fitting between the prediction Y and the ground truth Y’. In some examples, because each column (or groups of columns) in matrices Y and Y’ has special meanings, the loss function of each column (or groups of columns) is evaluated and then summed to produce a total loss.

[0048] In some embodiments, a table schema is converted to a machine learning model format. For example, each table column can potentially be used as an input and/or an output. When a column is used as an input, conditions are specified regarding how to convert the values, and how to handle NULL (missing) values. When a column is used as an output, conditions are specified regarding how to convert the value, and what loss functions to use in the corresponding output.

[0049] In some embodiments, for both the input and/or output feature value conversions, an encoder of a machine learning model is used. For example, the encoder converts database values to value formats suitable for a machine learning model. For example, database values are converted to only numerical values. In another example, strings are converted into numbers.

[0050] In some embodiments, the encoder can work on the input side to convert a value to be an input format of the machine learning model. In some embodiments, the encoder can work on the output side to convert a value to be an output format of the machine learning model. In both cases, missing values are handled based on some predefined conditions.

[0051] In one embodiment, the compatible conversion methods for the encoder are maintained in a system as a library, with default method of conversion specified for each of the conversion methods. In one example, a relational database is created to store sales forecast information. The database tables to store sales forecast information includes: Store that has columns include store id, city, and state; Product that has columns include product_id, category 1, category 2, and price, etc.; and Sales that has columns include store_id, product id, date, and quantity.

[0052] In some embodiments, store id, and product id are entity IDs, and these IDs are typically the primary keys of the corresponding tables, for example the Store table and the Product table. And store id, and product id are referred as foreign keys in the Sales table. The entity IDs are used as linkages between multiple tables and they do not generate loss function in the machine learning model. The column values for columns city, state, category 1, and category2 are string or categorical values, and they are converted by the encoder to categorical values for the machine learning model. The column values for columns price, and quantity are numerical values, and they can be directly used in the machine learning model.

[0053] In some embodiments, the database specifies whether a column can be NULL.

If the column cannot be NULL, no values should be missing. If a column can be NULL, then a value could be missing (NULL). In a database, “0” or empty string is different from NULL.

[0054] In some embodiments, when a column holds numerical values, NULL values in the column are converted by the encoder to a default value (or average value). In addition, in some instances, a separate column for storing only 0 or 1 is optionally generated to record whether the original value is a null value. For example, if the row value for the separate column is 1, the original value is not null. If the row value for the separate column is 0, the original value is null. [0055] In some embodiments, when a column holds categorical values, the encoder adds an additional NULL category to the existing categorical values during the conversion. [0056] In some embodiments, when NULL is in the output values, a loss function will not be generated.

[0057] In some embodiments, entities including entity IDs are converted to embeddings for the machine learning model, for example, a neural network. Here, embeddings refer to vectors of numerical values to represent each example of the table values. When the entity IDs are referred to as the foreign keys in another table, the embeddings from the original tables are copied over in the machine learning model. In some embodiments, the embeddings can be obtained from training a machine learning model.

[0058] In some embodiments, the embeddings can be copied over to the other columns of the table where the entity ID is the primary key. For example, the Product table embedding includes vector values each represented as (category 1, category 2, price). The Store table embedding includes vector values each represented as (city, state). The Sales table embedding then includes vector values each represented at least as (store. state, store. city, product. category 1, product. category2, product. price, quantity).

[0059] In some embodiments, the encoder automatically detects the categorical columns. For example, when a database has categorical columns, they are directly converted into categorical values. In some instances, categories are represented by numerical values or strings that are not explicitly specified to be categories. In that case, a categorical value detection is triggered, i.e., a column is converted into categorical when the ratio of the number of distinct values versus the total number of values in the same column is smaller than a predetermined threshold. For example, the threshold is 1/3 or ¼. When such a conversion is triggered, a dictionary of possible values is built (including a “misc” category to handle values that are not seen during the training of the machine learning model). String or numerical values are converted into 0, 1, 2, 3, etc. An exemplary encoder library is shown in Table 1 below.

Table 1. Exemplary Encoder Library

[0060] According to Table 1, for example, if the column type of the original column is integer, the column values can be converted into numerical values. The loss function type used for the column type in the machine learning model is Mean squared error (MSE).

[0061] In another example from Table 1, if the column type of the original column is integer, the column values can be converted into categorical values. In this case, the automatic detection is triggered when the ratio of the number of distinct values versus the total number of values in the same column is smaller than a predetermined threshold. The loss function type used for the column type in the machine learning model is cross entropy. [0062] In some embodiments, a user can override default behaviors described in

Table 1. For example, users can change the conversion type for the machine learning model. [0063] Nowadays, massive amount of data is stored in various kinds of relational databases. In some embodiments, methods and systems for extracting the entity features from a relational database are disclosed herein. As described above, a relational database conforms to a database schema. The data within a relational database is relatively clean to be imported for deep learning. Furthermore, the data structures are clear and easy to extract from a relational database. [0064] However, relational databases usually hold raw business data and sometimes, the data is incomplete or noisy. The traditional data mining focuses more on the extraction of overall information while does not handle the incomplete or noisy data.

[0065] Figure 4 is a flow diagram 400 illustrating an entity feature extracting process according to some implementations of the present disclosure. In some embodiments, an automatic extraction process 410 is applied to a relational database 420. As described above, the relational database 420 includes a number of tables. Each table has a number of columns and a number of rows. Each column has a name and a type, and the type can include: Primary Key ID, Foreign Key IDs (reference to another table), Date/Datetime, Category (including categorical integers, strings, Boolean and/or others), Numeric, or Free text, etc. The relational database 420 is converted to a table structure or schema 422 for preparation of the input to a machine learning model, such as 116.

[0066] In some embodiments, a compiling process 412 is used to convert the table structure or schema 422 to a machine learning model structure. In some embodiments, the machine learning model structure includes a neural network 424. In some embodiments, each entity in the database has a corresponding table, where the entity is identified by a primary key.

[0067] In some embodiments, an embedding subnet is created for each entity. The subnet could be a simple embedding layer in a neural network. As described above, the same embedding is used whenever the key is referred to as a primary key in its defining table, or as a foreign key in other tables.

[0068] In some embodiments, a neural network is created for each table. In some embodiments, the primary key and foreign key columns are compiled to be inputs by default for each of the neural networks. Other columns other than the primary keys and foreign keys are compiled to be outputs by default for each of the neural networks. In some embodiments, a user can override the default input and output columns for each of the neural networks. [0069] Figure 5 is a block diagram illustrating the neural network structures converted or complied from the database structures according to some implementation of the present disclosure. In some embodiments, Figure 5 illustrates a neural network with entity embedding 500. In some embodiments, the component parts identified as “Loss” or “Total Loss” shown in Figure 5 are only used in the training process of the neural networks but not in the final machine learning model or the final neural network model with embedding 426 after the training process. [0070] As illustrated in Figure 5, a Sales Forecast relational database is compiled into a neural network structure. The Sales Forecast relational database consists of three tables: Store table with columns including store id, feature 1, and feature2, etc.; Date table with columns including date id, year, month, day, and weekday, etc.; and Sales table with columns including store id, date id, and sales.

[0071] The primary key column for the Store table is store id and the primary key column for the Date table is date id. In the Sales table, the foreign key columns are store id and date id that refer to the primary key store id column in the Store table, and the primary key date id column in the Date table, respectively. In this example, the primary key for the entity Store is store id, and the primary key for the entity Date is date id. Three neural networks: Store network, Sales network, and Date network, are created for the three tables represented by three entities: Store, Date, and Sales, respectively. The primary key column store id is used as the input to the Store neural network. The primary key column date id is used as the input to the Date neural network. The foreign key store id and date id columns are used as the inputs for the Sales neural network.

[0072] As shown in Figure 5, the whole structure 500 is viewed as three neural networks. Each entity embedding is referred to in two tables. For example, features related to the Date entity is referred to in both of the Date and Sales network structures. Features related to the Store entity is referred to in both of the Store and Sales network structures. The entity features will be updated when each table referring to those entity features is trained.

[0073] In some embodiments, as shown in Figure 5, there are three types of neural network modules: embedding subnet, transform subnet, and prediction head. Data is inputted from the embedding subnet, processed through the transform subnet, and predicted/outputted from the prediction head. In some embodiments, the neural network modules are multi-layer fully -connected networks. In some embodiments, the neural network modules can include more complicated constructions such as residual blocks, and attentions.

[0074] In some embodiments, each column of the relational database has a corresponding data type, and a nullable specification defined in the database schema.

Nullable specification defines whether a column can be empty.

[0075] In some embodiments, a library of input subnets, output subnets and loss function modules are created. In some embodiments, a compatibility table is also maintained for subnet modules and data type compatibility. [0076] In some embodiments, each data type has a default input subnet, default output subnet and default loss function. In one example, for keys of the entities, the input subnet is an embedding subnet, the output subnet is linear with softmax activation, and the loss function is cross entropy. In another example, for categorical datatypes, the input subnet is one-hot encoding (with a few categories) or embedding (with many categories), the output subnet is linear with softmax activation, and the loss function is cross entropy. In yet another example, for numerical datatype, the input subnet is identity subnet (passthrough subnet), the output subnet is linear without activation, and the loss function is mean-square-root function. In some embodiments, a user can override the default submodule selections.

[0077] During the compiling process such as 412, in some embodiments, the nullable columns are converted according to some predefined rules. For example, if an input column has null values and the input column type is numerical, the input is then converted into two inputs, the number itself and a separate null (0/1) indicator. In another example, if an output column has null values and the input column type is entity (key for entity )/categorical, the output is then converted with the existing categorical types and an additional null category. When an output has null values, the null values do not generate loss functions (loss = 0). [0078] In some embodiments, the system disclosed herein can operate fully automatically. There’s a default mechanism to convert each column as an input or an output, for example, keys of the entities as inputs, all others as outputs. Each input/output column has a default submodule depending on its data type. The system has a default transformation subnet implementation.

[0079] In some embodiments, a user can customize the compiling process 412. The user can override one or more of the parameter settings. For example, the user can specify a particular column to be input or output. The user can select compatible input, output, and/or loss function submodules for each column. The user can implement the subnet. In some embodiments, the user’s overriding can be implemented with a graphical user interface or a configuration file at a client device such as 102.

[0080] In some embodiments, as shown in Figure 4, a training step 414 is applied to the neural network 424 to produce an embedding model 426. In some embodiments, the embedding model has losses or residuals as described above. In some embodiments, the neural networks formed from each table of the relational database are trained in batches. For each iteration of the training, a neural network converted from a table is selected. A set of records of that table is loaded from the database, or intermediate storage, as a minibatch. The minibatch is then fed to the neural network for training.

[0081] In some embodiments, different selection strategies are used in the training.

For some examples, a round robin method is implemented for the selection of the tables/neural networks. The neural networks converted from the tables are trained in a sequence. The records within each table are also trained in a sequence.

[0082] In some other examples, a weighted sampling method is implemented for the selection of the tables/neural networks. Each record in each of the neural networks converted from the tables has a loss computed from the previous training iterations as shown in Figure 5. Each of the neural networks has a respective total loss or a respective summed loss from each of the losses from the records in the respective table/neural network. The losses are converted into probabilities. A higher loss means a higher probability. Tables/neural networks and/or records are sampled by probabilities for the training. A higher probability means more weight in the sampling process for the training.

[0083] The neural network structures as shown in Figure 5 also have a total loss from each of the losses from the records in all tables/neural networks. The losses and total loss shown in Figure 5 are only used in the training but not used as inferences for the neural networks.

[0084] A prediction head and a prediction for each of the columns or features is also included in Figure 5. For example, the output columns including Feature 1 and Feature 2 in the Store neural network hold prediction values. The output column Sales in the Sales network holds prediction values. The output columns including year and month in the Date neural network hold prediction values.

[0085] In some embodiments, other common neural network training practices that can be adopted in the training 414 including optimizers, learning rate schedule, etc.

[0086] In some embodiments, the trained neural network can have many applications.

During the training, the neural network generates a fitting/prediction for each output column as shown in Figure 5. The fitting or prediction is what the neural network thinks the value should be, and the fitting or prediction is typically different from the actual value. The difference between the prediction and the actual ground truth value of the output column is measured by the loss function.

[0087] In some embodiments, the neural network generated values can be used as predictions or completion for missing values or null values in a relational database. [0088] In some embodiments, values with a big loss in the outputs of the neural networks can be used as outlier/error detection for the relational data when the original values in the relational database is not null.

[0089] Figure 6 is a flowchart illustrating an exemplary process 600 by which a computing system implements the techniques of extracting an entity feature from a relational database in accordance with some implementations of the present disclosure.

[0090] According to Figure 6, the computing system extracts the table structures from the relational database (610).

[0091] The computing system then converts the table structures to neural network structures, and each entity in the relational database has a corresponding embedding subnet in the neural network structures (620).

[0092] The computing system trains the neural network structures by feeding a first set of table column values as inputs of the neural network structures and comparing outputs of the neural network structures with a second set of table column values (630).

[0093] The computing system generates the trained neural network structures including the embedding subnets as entity features of the relational database (640).

[0094] In some embodiments, extracting table structures (610) includes: identifying the tables to be converted; and converting the table column values of the identified tables into a feature value format.

[0095] In some embodiments, converting the table structures to the neural network structures (620) includes: creating a respective subnet of the neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as the input table columns to the neural network structures; identifying a second set of table columns as the output table columns to the neural network structures; and creating a respective neural network structure for each table.

[0096] In some embodiments, the first set of table columns is table columns identified by primary keys or foreign keys, and the second set of table columns is table columns not identified by the primary keys or the foreign keys.

[0097] In some embodiments, training the neural network structures (630) includes: selecting a respective neural network for each table; loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for each table; reducing the difference between respective outputs of the respective neural network and the respective fitting outputs; and generating a prediction for each of table column values of the respective outputs.

[0098] In some embodiments, training the neural network structures (630) further includes: generating a loss function for the each of the table column values of the respective outputs by comparing the difference between the actual column value and the prediction for the each of the table column values of the respective outputs.

[0099] In some embodiments, training the neural network structures (630) further includes: training the neural networks including the respective neural network from the each table in a round robin scheduling.

[00100] In some embodiments, training the neural network structures (630) further includes: converting a respective loss function from the previous training iterations to a respective probability, and sampling a respective column value for training based on the respective probability.

[00101] In some embodiments, the neural network structures include multi-layer fully- connected networks.

[00102] In some embodiments, the process 600 of extracting entity features from a relational database further includes: in accordance with a determination that an input value in an input table column to the neural network structures is null and the type of the input table column is numerical, converting the input value in the input table column into a number and an indicator whether a respective input value is null.

[00103] In some embodiments, the process 600 of extracting entity features from a relational database further includes: in accordance with a determination that an input value in an input table column to the neural network structures is null and a type of the input table column is categorical, adding a null category to the existing categories of the input table column.

[00104] In some embodiments, the process 600 of extracting entity features from a relational database further includes: in accordance with a determination that an output fitting value in an output table column of the neural network structures is null, seting the loss function for an output value to a default value.

[00105] In some embodiments, the identifications of the first set of table columns as the input table columns and the second set of table columns as the output table columns of the neural network structures are configurable. [00106] In some embodiments, the identifications of the first and second sets of table columns are from a graphical user interface or a configuration file.

[00107] In some embodiments, systems and methods for anomaly detection in a relational database are disclosed herein. Anomalies in a database can be automatically detected using machine learning models. User-defined actions are triggered upon detection of an anomaly. In some embodiments, a user can annotate the detected anomalies for model improvement in the system.

[00108] In a relational database, anomalies are values in the database tables that deviate from regular values. Anomalies can be caused by different reasons. For example, there are errors in the data input or gathering process. In some other examples, abrupt changes in business could cause the deviation in the data. Table 2 below shows an exemplary database table anomaly.

Table 2. Exemplary database table anomaly

[00109] As shown in Table 2 above, on date 2021-01-01 and 2021-01-03, the sales numbers are around 1000 but on 2021-01-02, the sales number is abnormally big which can be an anomaly.

[00110] Anomaly detection can be useful in many aspects. For example, certain anomalies require human action. Detected anomalies can be used for data cleaning that is useful for subsequent data mining and machine learning.

[00111] Figure 7 is a block diagram of a system architecture 700 of the data anomaly detection system according to some implementations of the present disclosure.

[00112] In some embodiments, in order to convert data from the relational database 702, one or more of the database adapters 704 including model adapter 706 and data adapter 710 are implemented.

[00113] A model adapter 706 reads the database schema from the relational database 702, and converts the database schema to the machine learning models 708. The model adapter 706 identifies tables to be converted (those contain fields other than keys). Similar as described in Figures 4 to 6, each table is converted to a machine learning model. Table columns are converted to input or output features of the machine learning models 708. In some embodiments, all the columns for each table are converted to input of a machine learning model of the table with primary keys and foreign keys as embeddings. All the columns other than primary keys or foreign keys for each table are converted to output of a machine learning model of the table.

[00114] A data adapter 710 reads data from the relational database 702, and feeds the data into the machine learning models 708 for training 712. The data adapter 710 converts database column values into feature values with encoders. In some embodiments, a user interface will allow the user to override default conversion behavior, e.g., forcing an integral column type to be a categorical or a numerical type.

[00115] In some embodiments, the machine learning models 708 are neural networks as described in Figures 4 to 6. In some embodiments, the machine learning models 708 have a self-encoder style. The machine learning models 708 take the content of database as the input, and generate the output of the same format as the database output columns. The machine learning models 708 can predict what each value in the database should be. In some embodiments, the machine learning models 708 can fill the missing values of the input database. In some embodiments, the values in the database are conceptually smoothed by the machine learning models 708. In some embodiments, as described in Figure 5, a neural network with entity-embedding 500 is created and similar processes described in Figures 4-6 are implemented to form the machine learning models 708.

[00116] Figure 8 is a block diagram illustrating simplified neural networks 800 converted from a Sales Forecast database for anomaly detection in accordance with some implementations of the present disclosure. Compared with Figure 5, certain columns by default including columns related to date and columns related to IDs with a large number of instances do not generate corresponding outputs and loss functions. For example, the Date and Store id columns do not generate corresponding outputs and loss functions.

[00117] As shown in Figure 7, a training process 712 is conducted for the machine learning models 708. In some embodiments, the machine learning models 708 are trained periodically, for example, in a predetermined interval, by pulling the data from the relational database 702 through the data adapter 710. In some embodiments, the relational database 702 is monitored and trained only when the amount of change in the database has exceeded a certain threshold.

[00118] In some embodiments, anomalies 714 are detected for the relational database 702 after the training 712 to the machine learning models 708. In another word, the trained machine learning models are used to predict the anomalies 714. For each column with a loss function as shown in Figure 8, a threshold is determined for the loss function. In some embodiments, anomalies are detected when the loss of the column value is surpassing the pre-defmed threshold, for example, a predetermined percentage difference or a predetermined threshold number. In some instances, the predetermined percentage can be 50 %.

[00119] In some embodiments, after the model training 712, all the anomalies 714 are pooled into one table. In one example, the anomaly table may include the following columns as shown in Table 3 below.

Table 3. Exemplary columns in the anomaly table generated by the machine learning models.

[00120] As shown in Table 3 above, the anomaly table includes columns such as Table (identifying which table has an anomaly), Column (identifying which column in the identified table has the anomaly), Row ID (identifying which row has the anomaly in the identified table and column), Real Value (the real value in the database with the anomaly), Predicted Value (the predicted value from the trained machine learning model), Loss (the loss calculated from the difference between the real value and the predicted value), and Annotations (user provided annotations 718 to the anomaly).

[00121] In some embodiments, the detected anomalies 714 pass through a classification model 716. In some embodiments, classification tags for the classified anomalies 720 are added by the classification model 716.

[00122] In some embodiments, an annotation user interface 722 is provided for the users to view the anomalies and their classifications.

[00123] In some embodiments, the user can add annotations 718 to each anomaly detected. In some examples, when an anomaly detected is not really an anomaly or is considered to be normal, the user can annotate an anomaly to be false positive. The user can adjust the system generated classification tags from the classification model 716 or add new tags to the anomalies.

[00124] In some embodiments, the user annotations can be used in the subsequent trainings. For examples, false positive annotations are used in the subsequent training to suppress similar false positives. In another example, the user annotations can be used to train new or better annotation classification models by using anomaly as input, and comparing and reducing the difference between the predicted user annotations from the classification model and the ground truth user annotations.

[00125] In some embodiments, the system disclosed here in implements a classification bootstrap process. Initially there’s no annotation data, or a classification model. As more and more user annotations are collected, the classification models are trained and improved from the user annotations. In some embodiments, the classification models are neural network models.

[00126] In some embodiments, user-defined actions 724 including tags are added to the classified anomalies 720. The user can annotate anomalies with pre-defmed tags. For example, the list of possible tags is maintained and can be adjusted by the users. In some examples, the classification model 716 maps the classification to tags.

[00127] In some embodiments, the user can associate a number of action rules to each tag. For example, an email or a short message notification may be triggered when the user associates a message tag to an anomaly. In another example, a pre-defmed system task may be triggered when the user associates a message tag to a task code. In some embodiments, when the classification model predicts that an anomaly has a tag corresponding a classified anomaly 720, the corresponding actions of the tag is triggered.

[00128] Figure 9 is a flowchart illustrating an exemplary process 900 by which a computing system implements the techniques of anomaly detection in a relational database in accordance with some implementations of the present disclosure.

[00129] According to Figure 9, the computing system extracts table structures from the relational database, and the table structures include multiple entities and each entity has multiple table columns (910).

[00130] The computing system then converts each table in the table structures to a machine learning model (920).

[00131] The computing system trains the machine learning models by feeding the corresponding table columns into the machine learning models (930). [00132] The computing system identifies a value of the table columns as an anomaly when a difference between the value and a predicted value by the machine learning models exceeds a predefined threshold (940).

[00133] In some embodiments, extracting table structures (910) includes: identifying tables to be converted; and converting table column values of the identified tables into a feature value format.

[00134] In some embodiments, converting each table in the table structures to the machine learning model (920) includes: creating a respective subnet of neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as the input table columns to the neural network structures; identifying a second set of table columns as the output table columns to the neural network structures; creating a respective neural network structure for each table, and converting the primary keys and foreign keys into a feature value format in an embedding layer.

[00135] In some embodiments, the first set of table columns consists of all table columns, and the second set of table columns includes table columns not identified by the primary keys or the foreign keys.

[00136] In some embodiments, training the machine learning models (930) includes: selecting a respective neural network for each table; loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for the each table; generating a prediction for each of table column values of respective outputs; and reducing a difference between the respective outputs of the respective neural network and the respective fitting outputs.

[00137] In some embodiments, identifying the value in the one of the table columns as the anomaly (940) further includes: generating a loss function for the each of the table column values of the respective outputs by comparing the difference between an actual column value and the prediction for the each of the table column values of the respective outputs.

[00138] In some embodiments, identifying the value in the one of the table columns as the anomaly (940) further includes: in accordance with a determination that the loss function corresponding to the value is greater than a predetermined threshold, determining the table column value of the respective outputs as the anomaly. [00139] In some embodiments, the anomaly detection process 900 further includes: triggering a user-defined action upon the identification of the anomaly.

[00140] In some embodiments, the user-defined action includes a user annotation on the anomaly.

[00141] In some embodiments, the anomaly detection process 900 further includes: training a classification model using the identified anomalies and corresponding user annotations on the identified anomalies.

[00142] In some embodiments, the anomaly detection process 900 further includes: predicting a classification tag for the identified anomaly based on the trained classification model.

[00143] In some embodiments, the anomaly detection process 900 further includes: in accordance with a determination that the trained classification model predicts a classification tag for the identified anomaly, a corresponding classification action for the classification tag is triggered.

[00144] In some embodiments, the anomaly detection process 900 further includes: viewing the identified anomalies and corresponding classifications from a user interface. [00145] In some embodiments, the anomaly detection process 900 further includes: monitoring and further training the relational database when the number of the identified anomalies has exceeded a certain threshold.

[00146] In some embodiments, the anomaly detection process 900 further includes: training a classification model using the identified anomalies and a user false positive annotation on the respective anomaly.

[00147] In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer- readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the implementations described in the present application. A computer program product may include a computer- readable medium.

[00148] The terminology used in the description of the implementations herein is for the purpose of describing particular implementations only and is not intended to limit the scope of claims. As used in the description of the implementations and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof.

[00149] It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first electrode could be termed a second electrode, and, similarly, a second electrode could be termed a first electrode, without departing from the scope of the implementations. The first electrode and the second electrode are both electrodes, but they are not the same electrode. [00150] The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications, variations, and alternative implementations will be apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others skilled in the art to understand the invention for various implementations and to best utilize the underlying principles and various implementations with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of claims is not to be limited to the specific examples of the implementations disclosed and that modifications and other implementations are intended to be included within the scope of the appended claims.

Claims

What is claimed is:

1. A method of for anomaly detection in a relational database, comprising: extracting table structures from the relational database, the table structures including multiple entities and each entity having multiple table columns; converting each table in the table structures to a machine learning model; training the machine learning models by feeding the corresponding table columns into the machine learning models; and identifying a value of the table columns as an anomaly when a difference between the value and a predicted value by the machine learning models exceeds a predefined threshold.

2. The method according to claim 1, wherein extracting table structures includes: identifying tables to be converted; and converting table column values of the identified tables into a feature value format.

3. The method according to claim 1, wherein converting each table in the table structures to the machine learning model includes: creating a respective subnet of neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as input table columns to the neural network structures; identifying a second set of table columns as output table columns to the neural network structures; creating a respective neural network structure for each table, and converting primary keys and foreign keys into feature value format in an embedding layer.

4. The method according to claim 3, wherein the first set of table columns consists of all table columns, and the second set of table columns includes table columns not identified by the primary keys or the foreign keys.

5. The method according to claim 1, wherein training the machine learning models includes: selecting a respective neural network for each table; loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for the each table; generating a prediction for each of table column values of respective outputs; and reducing a difference between the respective outputs of the respective neural network and the respective fitting outputs.

6. The method according to claim 5, wherein identifying the value in the one of the table columns as the anomaly further includes: generating a loss function for the each of the table column values of the respective outputs by comparing a difference between an actual column value and the prediction for the each of the table column values of the respective outputs.

7. The method according to claim 6, wherein identifying the value in the one of the table columns as the anomaly further includes: in accordance with a determination that the loss function corresponding to the value is greater than a predetermined threshold, determining the table column value of the respective outputs as the anomaly.

8. The method according to claim 1, further comprising: triggering a user-defined action upon the identification of the anomaly.

9. The method according to claim 8, wherein the user-defined action includes a user annotation on the anomaly.

10. The method according to claim 1, further comprising: training a classification model using the identified anomalies and corresponding user annotations on the identified anomalies.

11. The method according to claim 10, further comprising: predicting a classification tag for the identified anomaly based on the trained classification model.

12. The method according to claim 10, further comprising: in accordance with a determination that the trained classification model predicts a classification tag for the identified anomaly, a corresponding classification action for the classification tag is triggered.

13. The method according to claim 11, further comprising: viewing the identified anomalies and corresponding classifications from a user interface.

14. The method according to claim 1, further comprising: monitoring and further training the relational database when a number of the identified anomalies has exceeded a certain threshold.

15. The method according to claim 1, further comprising: training a classification model using the identified anomalies and a user false positive annotation on the respective anomaly.

16. An electronic apparatus comprising: one or more processing units; memory coupled to the one or more processing units; and a plurality of programs stored in the memory that, when executed by the one or more processing units, cause the electronic apparatus to perform the method of claims 1-15.

17. A non-transitory computer readable storage medium storing a plurality of programs for execution by an electronic apparatus having one or more processing units, wherein the plurality of programs, when executed by the one or more processing units, cause the electronic apparatus to perform the method of claims 1-15.