CN114722061A

CN114722061A - Data processing method and device, equipment and computer readable storage medium

Info

Publication number: CN114722061A
Application number: CN202210370986.6A
Authority: CN
Inventors: 谭涵秋; 宋捷
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-07-08
Anticipated expiration: 2042-04-08
Also published as: CN114722061B

Abstract

The embodiment of the application discloses a data processing method, a data processing device, data processing equipment and a computer readable storage medium. The method comprises the following steps: inputting the characteristic data in the measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; if the reconstruction error value is larger than a preset error threshold value, deviation comparison is carried out on the feature data and abnormal feature data contained in a preset storage library to obtain the deviation rate of the feature data relative to the abnormal feature data; determining the abnormal condition of the characteristic data characterization based on the relation between the deviation rate and a preset deviation threshold value; and if the characteristic data represent that new abnormity exists, storing the characteristic data into a preset storage library. The method utilizes the self-encoder model, and can accurately determine abnormal characteristic data in the measurement report data by reconstructing the error value.

Description

Data processing method and device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a data processing method and apparatus, a device, and a computer-readable storage medium.

Background

At present, most of the network configurations of 5G are in a "single" or "big unified" mode, and cannot meet the requirement of network fine management. Particularly, when the network is abnormal, the network cannot be automatically repaired in real time, a large amount of characteristic data exists in the measurement report data, the relevant characteristic data is compared with the abnormal characteristic data stored in the storage library, whether the relevant characteristic data is abnormal or not is determined, so that the abnormality of the measurement report data is reflected, and whether the base station generating the measurement report data is abnormal or not is determined.

However, the prior art does not fully exploit the usage of the feature parameters, and cannot update the abnormal feature data in the repository in real time, and cannot accurately determine the abnormal measurement report data, so that the corresponding abnormal processing scheme cannot be matched.

Therefore, how to improve the accuracy of determining abnormal feature data in measurement report data is an urgent problem to be solved.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present application respectively provide a data processing method and apparatus, an electronic device, and a computer-readable storage medium, which determine whether characteristic data is abnormal by reconstructing an error value, and determine a condition that the characteristic data represents an abnormality according to a deviation rate of the characteristic data with respect to abnormal characteristic data.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided a data processing method, including: inputting characteristic data in measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; if the reconstruction error value is larger than a preset error threshold value, performing deviation comparison on the feature data and abnormal feature data contained in a preset storage library to obtain a deviation rate of the feature data relative to the abnormal feature data; determining the abnormal condition of the characteristic data characterization based on the relation between the deviation rate and a preset deviation threshold value; and if the characteristic data represent that a new abnormity exists, storing the characteristic data into the preset storage library.

In another embodiment, before the feature data in the measurement report data is input into the trained auto-encoder model to obtain the reconstruction error value corresponding to the feature data, the method further includes: constructing an initial self-encoder model, and preprocessing the characteristic data extracted from the measurement report sample data to obtain characteristic sample data; inputting the characteristic sample data into the initial self-encoder model to obtain a similarity coefficient of the characteristic sample data; and if the similarity coefficient of the feature sample data is smaller than a preset similarity threshold, training the initial self-encoder model by using the feature sample data to obtain the trained self-encoder model.

In another embodiment, the preprocessing the feature data extracted from the measurement report sample data to obtain feature sample data includes: clustering the characteristic data extracted from the sample data of the measurement report to obtain clustering characteristic data of multiple categories; performing two classification processing on the clustering characteristic data of the multiple classes respectively to obtain the contribution degree of the clustering characteristic data of each class; and determining the clustering feature data with the contribution degree larger than a preset contribution threshold value as the feature sample data.

In another embodiment, the inputting the feature sample data into the initial self-encoder model to obtain a similarity coefficient of the feature sample data includes: inputting the feature sample data into the initial self-encoder model to obtain the abnormal degree of the feature sample data and the abnormal distance between the feature sample data and the standard abnormal feature data in the initial self-encoder model; and calculating a similarity coefficient of the feature sample data based on the abnormal degree and the abnormal distance.

In another embodiment, the calculating a similarity coefficient of the feature sample data based on the degree of abnormality and the abnormal distance includes: acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance; performing a product operation on the abnormal degree and a weight value corresponding to the abnormal degree to obtain a first value, and performing a product operation on the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value; and performing summation operation on the first value and the second value to obtain a similarity coefficient of the feature sample data.

In another embodiment, the determining that the characteristic data characterizes an abnormal condition based on the relationship between the deviation rate and a preset deviation threshold includes: comparing the deviation rate to the preset deviation threshold; if the deviation rate is larger than the preset deviation rate threshold value, determining that the characteristic data representation has new abnormity; and if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

In another embodiment, the storing the feature data into the predetermined repository includes: determining an exception handling scheme corresponding to the new exception characterized by the feature data; wherein the exception handling scheme is to handle the exception to resume normal operation; and storing the exception handling scheme and the characteristic data in the preset storage library in an associated manner.

According to an aspect of an embodiment of the present application, there is provided a data processing apparatus including:

the acquisition module is configured to input the characteristic data in the measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data; the comparison module is configured to perform deviation comparison on the feature data and abnormal feature data contained in a preset storage library to obtain a deviation rate of the feature data relative to the abnormal feature data if the reconstruction error value is greater than a preset error threshold; a determination module configured to determine that the characteristic data characterizes an abnormal condition based on a relationship between the deviation rate and a preset deviation threshold; and the updating module is configured to store the characteristic data into the preset storage library if the characteristic data represents that a new exception exists.

In another embodiment, the data processing apparatus further includes: the model building module is configured to build an initial self-encoder model and preprocess the feature data extracted from the sample data of the measurement report to obtain the feature sample data; inputting the characteristic sample data into the initial self-encoder model to obtain a similarity coefficient of the characteristic sample data; and if the similarity coefficient of the feature sample data is smaller than a preset similarity threshold, training the initial self-encoder model by using the feature sample data to obtain the trained self-encoder model.

In another embodiment, the model building module comprises: the preprocessing unit is configured to perform clustering processing on the feature data extracted from the measurement report sample data to obtain clustering feature data of multiple categories; the classification unit is configured to perform two-classification processing on the clustering feature data of the multiple classes respectively to obtain the contribution degree of the clustering feature data of each class; and the determining unit is used for determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as the characteristic sample data.

In another embodiment, the model building module comprises: an abnormal parameter unit configured to input the feature sample data into the initial self-encoder model, so as to obtain an abnormal degree of the feature sample data and an abnormal distance between the feature sample data and standard abnormal feature data in the initial self-encoder model; a similarity coefficient calculation unit configured to calculate a similarity coefficient of the feature sample data based on the degree of abnormality and the abnormality distance.

In another embodiment, the similarity coefficient calculating unit further includes: obtaining a plate block: the method is configured to obtain a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance. Operation panel: the abnormal distance detection device is configured to multiply the abnormal degree and a weight value corresponding to the abnormal degree to obtain a first value, and multiply the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value. Summation panel: and the similarity coefficient is configured to perform summation operation on the first value and the second value to obtain the similarity coefficient of the feature sample data.

In another embodiment, the summation tile is specifically configured to: acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance; performing a product operation on the abnormal degree and a weight value corresponding to the abnormal degree to obtain a first value, and performing a product operation on the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value; and performing summation operation on the first value and the second value to obtain a similarity coefficient of the feature sample data.

In another embodiment, the determining module is specifically configured to compare the deviation rate with the preset deviation threshold; if the deviation rate is larger than the preset deviation rate threshold value, determining that the characteristic data representation has new abnormity; and if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

In another embodiment, the update module is specifically configured to determine an exception handling scheme corresponding to a new exception characterized by the feature data; wherein the exception handling scheme is to handle the exception to resume normal operation; and storing the exception handling scheme and the characteristic data in the preset storage library in an associated manner.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a controller; a memory for storing one or more programs which, when executed by the controller, perform the above-described method.

According to an aspect of embodiments of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the above-mentioned method.

According to an aspect of an embodiment of the present application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the above-described method.

In the technical scheme provided by the embodiment of the application, the feature data in the measurement report data are input into a trained self-encoder model to obtain a reconstruction error value corresponding to the feature data, whether the feature data are abnormal is determined through the reconstruction error value, the feature data characterization abnormal condition is determined according to the deviation rate of the feature data relative to the abnormal feature data, and the feature data with new abnormality are stored into a preset storage library, so that the abnormal feature data in the storage library are updated. According to the technical scheme, due to the fact that the self-encoder model is utilized, the abnormal feature data in the measurement report data can be accurately determined through error value reconstruction, the determined new abnormal feature data are used for updating the abnormal feature data in the storage library, the storage library is the latest real-time abnormal feature data, and therefore corresponding abnormal troubleshooting is more accurate through related abnormal feature data contained in the storage library in the later period, and the abnormal measurement report data are more accurately determined.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic illustration of an implementation environment to which the present application relates;

FIG. 2 is a flow chart illustrating a method of data processing according to an exemplary embodiment of the present application;

FIG. 3 is a flow chart of a proposed process of constructing a self-encoder model based on the embodiment shown in FIG. 2;

FIG. 4 is a flow chart of a proposed process of pre-processing feature data based on the embodiment shown in FIG. 3;

fig. 5 is a flowchart of a process for calculating a similarity coefficient of feature sample data according to the embodiment shown in fig. 3;

fig. 6 is a flowchart of a process of calculating a similarity coefficient of feature sample data based on the embodiment shown in fig. 5;

FIG. 7 is a flow chart of a process for determining characteristic data characterizing anomalies that is presented in accordance with the embodiment shown in FIG. 3;

FIG. 8 is a flow diagram of an associative memory process based on the exception handling scheme proposed by the embodiment shown in FIG. 2;

fig. 9 is a schematic process diagram illustrating a base station automatically handling an abnormal situation based on the data processing method of the present application according to an exemplary embodiment of the present application;

FIG. 10 is a block diagram of a data processing apparatus according to an exemplary embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer system of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Referring first to fig. 1, fig. 1 is a schematic diagram of an implementation environment related to the present application. The implementation environment includes a terminal 100, a server 200, and a base station 300, and the terminal 100, the server 200, and the base station 300 communicate with each other through a wired or wireless network.

The user terminal 100 includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, and the like, and may be any electronic device capable of implementing image visualization, such as a smart phone, a tablet, a notebook, a computer, and the like, without limitation here.

The terminal 100 can generate MR (Measurement Report) data for reflecting information such as network quality, user behavior habits, and surrounding environment. The terminal 100 can transmit the MR data to the server 200 in real time.

The server 200 extracts the characteristic data of the received MR data, inputs the characteristic data into a pre-trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data output by the self-encoder model, judges a threshold value according to the reconstruction error value to obtain a deviation rate of the characteristic data, and stores the characteristic data as new abnormal characteristic data in a storage library if the deviation rate of the characteristic data is greater than a second preset threshold value. The server 200 may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, where the plurality of servers may form a block chain, and the server is a node on the block chain, and the server 200 may also be a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network ), big data, and artificial intelligence platform, which is not limited herein.

The base station 300 serves as a data relay station, which provides communication services for the terminal 100, and a request signal sent by the corresponding terminal 100 also needs to be transmitted to an operation service provider through the base station 300, and meanwhile, if the server 200 detects that the MR data transmitted by the terminal 100 is abnormal, the server 200 sends a corresponding abnormal processing scheme to the base station 300 to perform abnormal processing on the MR data.

The embodiment can detect whether the measurement report data sent by the terminal is abnormal in real time, and if so, the corresponding abnormal processing scheme is sent to the base station for abnormal processing, so that the base station can process the abnormal conditions in real time and automatically, and the method and the device are suitable for scenes with high data real-time requirements, such as communication base station maintenance and the like.

In the existing method, all data except abnormal characteristic data are generally input into an auto-encoder model as normal data, but a large amount of noise exists often, the distribution of the auto-encoder learning normal samples can be directly influenced, and the learned distribution is inaccurate.

Referring to fig. 2, fig. 2 is a flowchart illustrating a data processing method according to an exemplary embodiment of the present application, which may be specifically executed by the server 200 in the implementation environment shown in fig. 1. Of course, the method may also be applied to other implementation environments and executed by a server device in other implementation environments, and the embodiment is not limited thereto. As shown in fig. 2, the method at least includes steps S210 to S240, which are described in detail as follows:

s210: and inputting the characteristic data in the measurement report data into the trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data.

The measurement report data is data reported by the user terminal in real time, and reflects information such as network quality, user behavior habits, surrounding environment and the like.

The Auto-Encoder model (AE) of the present embodiment is a model trained in advance, and is trained by using normal feature data in the training process.

The reconstruction error value is used for judging whether the characteristic data input into the self-encoder model is abnormal or not, and the preset reconstruction error value is compared with the reconstruction error value output from the encoder model to determine whether the characteristic data is abnormal or not, so that whether the measurement report data corresponding to the characteristic data is abnormal or not is determined.

For an exemplary explanation of S210, the measurement report data is first preprocessed, including null padding, abnormal data elimination, and normalization processing, to obtain the preprocessed measurement report data. And then extracting characteristic data of the preprocessed measurement report data, and inputting the characteristic data into the trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data.

S220: and if the reconstruction error value is greater than a preset error threshold value, performing deviation comparison on the feature data and abnormal feature data contained in a preset storage library to obtain the deviation rate of the feature data relative to the abnormal feature data.

The deviation rate of the present embodiment refers to a deviation rate of the feature data with respect to the abnormal feature data, and if the feature data deviates with respect to the abnormal feature data, a corresponding deviation amount is obtained, and the deviation amount of the feature data with respect to the abnormal feature data is obtained by dividing the deviation amount by the total amount of the abnormal feature data.

And determining the feature data with the reconstruction error value larger than the preset reconstruction error value as abnormal data, and performing deviation comparison on the feature data and the abnormal feature data contained in the preset storage library to obtain the deviation rate of the feature data relative to the abnormal feature data in order to further determine whether the feature data is the abnormal feature data.

S230: and determining the abnormal condition of the characteristic data characterization based on the relation between the deviation rate and a preset deviation threshold value.

And determining the abnormal condition of the characteristic data characterization according to the size relation between the deviation rate and the preset deviation threshold. For example, if the deviation rate is greater than a preset deviation threshold, determining that a new anomaly occurs in the characterization data representation; and if the deviation rate is less than or equal to the preset deviation threshold value, determining that the characterization data is not abnormal or no new abnormality occurs.

S240: and if the characteristic data represent that new abnormity exists, storing the characteristic data into a preset storage library.

In this embodiment, the feature data representing that the new anomaly exists is stored in the preset repository, so as to update the anomaly feature data in the preset repository.

The preset storage library of the embodiment includes the abnormal characteristic data and the abnormal processing scheme for the abnormal characteristic data, and may also include other data, which is not limited specifically here.

In this embodiment, feature data in measurement report data is input into a trained auto-encoder model to obtain a reconstruction error value corresponding to the feature data, whether the feature data is abnormal is determined by the reconstruction error value, a condition that the feature data represents an abnormality is determined according to a deviation rate of the feature data relative to abnormal feature data, and the feature data with a new abnormality is stored in a preset storage library, so that abnormal feature data in the storage library is updated. In the embodiment, the self-encoder model is utilized, the abnormal feature data in the measurement report data can be accurately determined by reconstructing the error value, and the determined new abnormal feature data is used for updating the abnormal feature data in the storage library, so that the storage library is the latest and real-time abnormal feature data, and therefore, the corresponding abnormal investigation performed by using the related abnormal feature data in the storage library is more accurate in the later period, and the abnormal measurement report data is more accurately determined.

Referring to fig. 3, fig. 3 is a flowchart of a process for constructing a self-encoder model according to the embodiment shown in fig. 2. S310 to S330 are also included before S210 shown in fig. 2, and are described in detail below:

s310: and constructing an initial self-encoder model, and preprocessing the characteristic data extracted from the measurement report sample data to obtain the characteristic sample data.

The initial self-encoder model of the embodiment can learn the sample distribution of the normal characteristic data by taking the minimized reconstruction error as an objective function.

The embodiment preprocesses the feature data, and includes: and screening out characteristic data with small contribution to the self-encoder model. For example, the feature data are preprocessed by using a clustering model KMEANS, the contribution degree of the feature data to the KMEANS model is obtained by using a classification model, and then the feature data are screened according to the contribution degree.

S320: and inputting the characteristic sample data into the initial self-encoder model to obtain the similarity coefficient of the characteristic sample data.

The feature sample data is input into the initial self-encoder model of this embodiment, and the similarity coefficient of the feature sample data can be obtained, where the initial self-encoder model is an untrained self-encoder model.

S330: and if the similarity coefficient of the feature sample data is smaller than a preset similarity threshold, training the initial self-encoder model by using the feature sample data to obtain a trained self-encoder model.

The preset similarity threshold of this embodiment is a preset parameter, for example, the preset similarity threshold is 0.75, and feature data greater than or equal to 0.75 is considered to have a potential anomaly, and the feature data with the potential anomaly filtered out is used as normal data to train the initial self-encoder model, so as to obtain a trained self-encoder model.

The embodiment further illustrates how to train the self-encoder model, and by obtaining the similarity coefficient of the feature data, the feature data with the similarity coefficient smaller than the preset similarity threshold is used as normal data to train the initial self-encoder model, so that the data output by the trained self-encoder model is more accurate in the optimization mode.

In the existing method, after feature preprocessing such as null filling and abnormal value processing, features are directly input into a self-encoder model for calculation. However, the training of the model is often negatively affected by the problems of co-linearity and feature redundancy among feature data. In order to solve the problem, in this embodiment, the feature data features are clustered, then the clustering labels are used to train the classifier, the original features are screened by using the feature contribution degree output by the classifier, and the clustering feature data with small contribution degree to the model are removed.

Fig. 4 is a flow chart of a proposed procedure for preprocessing feature data based on the embodiment shown in fig. 3. Based on S310, the step specifically includes S410 to S430, which are described in detail below:

s410: and clustering the feature data extracted from the measurement report sample data to obtain clustering feature data of multiple categories.

For example, in the clustering process of the embodiment, a KMEANS model is used, the feature data extracted from the measurement report sample data is input into the KMEANS model for preprocessing, the optimal clustering number K is obtained according to an elbow method, the clustering category of each feature data is output, and the clustering feature data of multiple categories is obtained.

S420: and respectively carrying out two classification treatments on the clustering characteristic data of the plurality of classes to obtain the contribution degree of the clustering characteristic data of each class.

Firstly, training a classifier corresponding to each cluster type, such as an RFC (random forest classifier); and then, inputting the clustering feature data of each category into the corresponding trained classifier again, thereby obtaining the contribution degree of the clustering feature data of each category.

S430: and determining the clustering feature data with the contribution degree larger than a preset contribution threshold value as feature sample data.

Illustratively, the preset contribution threshold is 0.05, feature data with the contribution degree smaller than or equal to 0.05 are removed, the feature data with the contribution degree smaller than or equal to 0.05 are repeatedly screened and removed until the contribution degree of the remaining feature data is larger than 0.05, and the clustering feature data with the contribution degree larger than 0.05 is determined as the feature sample data.

The embodiment further explains the preprocessing process of the feature data, performs clustering processing on the feature data, performs two-classification processing on the clustering feature data to obtain the contribution degree of the clustering feature data of each class, finally rejects the clustering feature data smaller than the preset contribution threshold, and determines the clustering feature data with the contribution degree larger than the preset contribution threshold as the feature sample data, so that the preprocessed feature sample data is more suitable for a subsequent self-encoder model, and can better determine abnormal feature data.

Fig. 5 is a flowchart of a process of calculating a similarity coefficient of feature sample data according to the embodiment shown in fig. 3. Based on S320, the step specifically includes S510 to S520, which are described in detail below:

s510: and inputting the feature sample data into the initial self-encoder model to obtain the abnormal degree of the feature sample data and the abnormal distance between the feature sample data and the standard abnormal feature data in the initial self-encoder model.

The abnormal degree is one of parameters for measuring the abnormal degree of the input sample data, and the distance between the feature sample data and the standard abnormal feature data is determined by spatializing the feature sample data and the standard abnormal feature data by using an IF (Isolation Forest) method.

S520: and calculating a similarity coefficient of the characteristic sample data based on the abnormal degree and the abnormal distance.

The calculation in this embodiment includes conventional mathematical calculation manners such as addition, subtraction, multiplication, division, and the like, so as to calculate the similarity coefficient of the sample data.

The embodiment further illustrates that the similarity coefficient of the feature sample data is obtained by calculating the abnormality degree and the abnormality distance of the feature sample data, so that whether the feature sample data is abnormal or not can be determined more accurately.

Fig. 6 is a flowchart of a process of calculating a similarity coefficient of feature sample data based on the embodiment shown in fig. 5. Based on S520, the step specifically includes S610 to S630, which are described in detail below:

s610: and acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance.

The weighting value of this embodiment is a mathematical constant, for example, a constant of 0 to 1.

S620: and performing product operation on the abnormal degree and the weight value corresponding to the abnormal degree to obtain a first value, and performing product operation on the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value.

Illustratively, the degree of abnormality is a, which corresponds to a weight of 0.2; the anomaly distance is B, and the corresponding weight is 0.8, the first value is 0.2A, and the second value is 0.8B.

S630: and performing summation operation on the first value and the second value to obtain a similarity coefficient of the feature sample data.

For example, if the degree of abnormality is a, the corresponding weight is 0.6, the first value is 0.6A, the abnormality distance is B, the corresponding weight is 0.4, and the second value is 0.4B, the similarity coefficient of the feature sample data is 0.6A + 0.4B.

Exemplarily, the calculation formula of the similarity coefficient of the feature sample data is as follows:

TS(x)＝θIS(x)+(1-θ)SS(x)，0＜θ＜1；

wherein ts (x) represents a similarity coefficient of the feature sample data, is (x) represents an abnormality degree of the feature sample data, ss (x) represents an abnormality distance, and θ represents a mathematical constant.

The degree of abnormality is calculated by the following formula:

wherein the content of the first and second substances,

wherein E (h (x)) represents an average path length between the feature sample data and the abnormal feature data stored in the repository, h (x) represents a path length between the training feature data and the abnormal feature data stored in the repository, x represents a position value of the feature sample data, c (n) represents a coefficient, and n represents a quantity value of the feature sample data; h (n) represents a harmonic progression, and the numerical value of the harmonic progression is determined according to the value of n.

The anomaly distance is calculated by the following formula:

wherein, mu_iRepresenting the location values of different anomaly characteristic data in the database.

The embodiment further clarifies the calculation process of the similarity coefficient of the feature sample data, and provides a specific calculation formula, so that the similarity coefficient of the feature sample data is more accurate.

It is not easy to find a new anomaly from a large number of data samples, and in this embodiment, chi-square test is used to determine whether newly detected anomaly feature data and known anomalies belong to the same data distribution, and if not, the new anomaly is associated with the anomaly feature data and stored in a repository, so as to ensure the integrity of the type of the anomaly feature data in the repository.

FIG. 7 is a flow chart of a process for determining characteristic data characterizing anomalies that is proposed based on the embodiment shown in FIG. 3. Based on S230, the step further includes S710 to S730, which are described in detail below:

s710: the rate of deviation is compared to a preset deviation threshold.

The preset deviation threshold is a key threshold for determining that the characteristic data represents an abnormal condition.

Illustratively, the deviation rate is calculated by using a Pearson method, and the calculation formula of the deviation rate is as follows:

wherein d represents the feature vector of the feature sample data, x represents the feature vector of the abnormal feature data in the storage library, and x²(α) represents a preset deviation threshold, x²Indicating the rate of deviation of the feature sample data from the outlier feature data.

S720: and if the deviation rate is greater than a preset deviation rate threshold value, determining that the characteristic data representation has new abnormity.

If x²＞x²(. alpha.), e.g. x²And (alpha) is 0.1, the characteristic sample data with the deviation rate larger than 0.1 represents that a new anomaly exists.

S730: and if the deviation rate is less than or equal to a preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

If x²≤x²(. alpha.), e.g. x²And (alpha) is 0.1, and if the deviation rate of the feature sample data is 0.05, determining that the feature sample data has no new abnormity.

The embodiment illustrates how to judge the deviation rate of the feature sample data and the preset deviation rate threshold value, and determine whether the feature data representation has new abnormality, that is, if the deviation rate is greater than the preset deviation rate threshold value, determine that the feature data representation has new abnormality; and if the deviation rate is less than or equal to a preset deviation rate threshold value, determining that the characteristic data characterization does not have new abnormity, and subsequently storing the characteristic data with new abnormity in a storage library so as to update the abnormal characteristic data in the storage library in real time.

Fig. 8 is a flowchart of an associative storage process based on the exception handling scheme proposed in the embodiment shown in fig. 2. Based on S230, the steps further include S810 to S820, which are described in detail below:

s810: determining an exception handling scheme corresponding to the new exception characterized by the feature data; wherein the exception handling scheme is used to handle exceptions to restore normal operation.

If the characteristic data represents a new exception, inquiring whether an exception handling scheme matched with the exception exists in a preset storage library, and if so, indicating that the exception handling scheme is preset in the preset storage library; there may also be a case where there is no matching exception handling scheme, i.e. the feature data represents a completely new exception, and there is no associated exception handling scheme in the predetermined repository.

S820: and storing the abnormal processing scheme and the characteristic data in a preset storage library in an associated manner.

And if the preset memory bank does not have the new exception handling scheme associated with the exception characteristic data, associating the exception handling scheme aiming at the exception characteristic data with the exception characteristic data, and simultaneously storing the exception handling scheme and the exception characteristic data into the preset memory bank.

Illustratively, the present embodiment provides a method for implementing automatic optimization of base station parameters based on the data processing method of the present application.

First, 15961 measurement report data of the base station cell for 7 days are collected for training of the self-encoder model, and the available features shown in table 1 can be selected as the feature data in the measurement report data.

Table 1: available characteristic data in measurement report data

As shown in fig. 9, fig. 9 is a schematic process diagram of a base station automatically handling an abnormal situation based on the data processing method of the present application according to an exemplary embodiment of the present application.

The server 200 receives the measurement report data transmitted from the terminal 100, and the server 200 extracts the feature data in the measurement report data and inputs the feature data into the encoder model, and processes the feature data with reference to the above S210 to S240, which is not described herein again. In particular, the embodiment emphasizes the process of automatically handling the abnormal condition after the base station parameter is abnormal. When determining the new abnormal feature data, setting a new abnormal processing scheme to be associated with the abnormal feature data, storing the abnormal processing scheme in a storage library, and sending the abnormal processing scheme to the base station 300, so that the base station 300 automatically processes the abnormal situation according to the abnormal processing scheme.

In this embodiment, the data processing method is further applied to an actual scene in which the base station automatically processes the abnormal condition, and the abnormal feature data and the abnormal processing scheme in the repository in the server can be updated in real time, and the abnormal processing scheme is sent to the base station, so that the base station automatically processes the abnormal condition.

Another aspect of the present application further provides a data processing apparatus, as shown in fig. 10, fig. 10 is a schematic structural diagram of the data processing apparatus shown in an exemplary embodiment of the present application. Wherein, the data processing device includes:

the obtaining module 1010 is configured to input the feature data in the measurement report data into the trained self-encoder model, so as to obtain a reconstruction error value corresponding to the feature data;

a comparison module 1030 configured to perform deviation comparison on the feature data and the abnormal feature data contained in the preset repository if the reconstruction error value is greater than a preset error threshold, so as to obtain a deviation rate of the feature data relative to the abnormal feature data;

a determination module 1050 configured to determine that the characteristic data represents an abnormal condition based on a relationship between the deviation rate and a preset deviation threshold;

the updating module 1070 is configured to store the feature data in a preset repository if the feature data indicates that a new anomaly exists.

In another embodiment, the data processing apparatus further comprises:

the model building module is configured to build an initial self-encoder model and preprocess the feature data extracted from the measurement report sample data to obtain feature sample data; inputting the feature sample data into an initial self-encoder model to obtain a similarity coefficient of the feature sample data; and if the similarity coefficient of the feature sample data is smaller than a preset similarity threshold, training the initial self-encoder model by using the feature sample data to obtain a trained self-encoder model.

In another embodiment, a model building module comprises:

the preprocessing unit is configured to perform clustering processing on the feature data extracted from the measurement report sample data to obtain clustering feature data of multiple categories;

the classification unit is configured to perform two-classification processing on the clustering characteristic data of multiple classes respectively to obtain the contribution degree of the clustering characteristic data of each class;

and the determining unit is used for determining the clustering characteristic data with the contribution degree larger than a preset contribution threshold value as the characteristic sample data.

In another embodiment, a model building module comprises:

the abnormal parameter unit is configured to input the feature sample data into the initial self-encoder model, and obtain the abnormal degree of the feature sample data and the abnormal distance between the feature sample data and the standard abnormal feature data in the initial self-encoder model;

and the similarity coefficient calculation unit is configured to calculate a similarity coefficient of the feature sample data based on the degree of abnormality and the abnormal distance.

In another embodiment, the similarity coefficient calculating unit further includes:

obtaining a plate block: the method is configured to obtain a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance.

And (4) an operation panel: the method comprises the steps of calculating the product of the abnormal degree and the weight value corresponding to the abnormal degree to obtain a first value, and calculating the product of the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value.

Summation panel: and the similarity coefficient is configured to perform summation operation on the first value and the second value to obtain the similarity coefficient of the feature sample data.

In another embodiment, the summation tile is specifically configured to: acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance; performing product operation on the abnormal degree and a weight value corresponding to the abnormal degree to obtain a first value, and performing product operation on the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value; and performing summation operation on the first value and the second value to obtain a similarity coefficient of the feature sample data.

In another embodiment, the determining module 1050 is specifically configured to compare the deviation rate with a preset deviation threshold; if the deviation rate is larger than a preset deviation rate threshold value, determining that new abnormity exists in the characteristic data representation; and if the deviation rate is less than or equal to a preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

In another embodiment, the updating module 1070 is specifically configured to determine an exception handling scheme corresponding to the new exception characterized by the feature data; wherein the exception handling scheme is used for handling exceptions to restore normal operation; and storing the exception handling scheme and the characteristic data in a preset storage library in an associated manner.

It should be noted that the data processing apparatus provided in the foregoing embodiment and the data processing method provided in the foregoing embodiment belong to the same concept, and specific ways for the modules and units to perform operations have been described in detail in the method embodiment, and are not described again here.

Another aspect of the present application also provides an electronic device, including: a controller; a memory for storing one or more programs, the one or more programs when executed by the controller for performing the method of data processing in the various embodiments described above.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer system of an electronic device according to an exemplary embodiment of the present application, which illustrates a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.

It should be noted that the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU)1101, which can perform various appropriate actions and processes, such as executing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for system operation are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output section 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. Drivers 1110 are also connected to the I/O interface 1105 as needed. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. When the computer program is executed by a Central Processing Unit (CPU)1101, various functions defined in the system of the present application are executed.

It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method as before. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist separately without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the data processing method provided in the above embodiments.

According to an aspect of an embodiment of the present application, there is also provided a computer system including a Central Processing Unit (CPU) that can perform various appropriate actions and processes, such as performing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) or a program loaded from a storage portion into a Random Access Memory (RAM). In the RAM, various programs and data necessary for system operation are also stored. The CPU, ROM, and RAM are connected to each other via a bus. An Input/Output (I/O) interface is also connected to the bus.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, and the like; an output section including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section including a hard disk and the like; and a communication section including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary.

The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

inputting characteristic data in measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data;

if the reconstruction error value is larger than a preset error threshold value, performing deviation comparison on the feature data and abnormal feature data contained in a preset storage library to obtain a deviation rate of the feature data relative to the abnormal feature data;

determining the abnormal condition of the characteristic data characterization based on the relation between the deviation rate and a preset deviation threshold value;

and if the characteristic data represent that a new abnormity exists, storing the characteristic data into the preset storage library.

2. The method according to claim 1, wherein before inputting the feature data in the measurement report data into the trained auto-encoder model to obtain the reconstruction error value corresponding to the feature data, the method further comprises:

constructing an initial self-encoder model, and preprocessing the characteristic data extracted from the sample data of the measurement report to obtain the characteristic sample data;

inputting the characteristic sample data into the initial self-encoder model to obtain a similarity coefficient of the characteristic sample data;

and if the similarity coefficient of the feature sample data is smaller than a preset similarity threshold, training the initial self-encoder model by using the feature sample data to obtain the trained self-encoder model.

3. The method according to claim 2, wherein the pre-processing the feature data extracted from the measurement report sample data to obtain the feature sample data comprises:

clustering the characteristic data extracted from the sample data of the measurement report to obtain clustering characteristic data of multiple categories;

performing two classification processing on the clustering feature data of the multiple categories respectively to obtain the contribution degree of the clustering feature data of each category;

and determining the clustering feature data with the contribution degree larger than a preset contribution threshold value as the feature sample data.

4. The method according to claim 2, wherein said inputting said feature sample data into said initial self-encoder model, obtaining a similarity coefficient of said feature sample data, comprises:

inputting the feature sample data into the initial self-encoder model to obtain the abnormal degree of the feature sample data and the abnormal distance between the feature sample data and the standard abnormal feature data in the initial self-encoder model;

and calculating a similarity coefficient of the feature sample data based on the abnormal degree and the abnormal distance.

5. The method according to claim 4, wherein the calculating a similarity coefficient of the feature sample data based on the degree of abnormality and the abnormal distance includes:

acquiring a weight value corresponding to the abnormality degree and a weight value corresponding to the abnormality distance;

performing a product operation on the abnormal degree and a weight value corresponding to the abnormal degree to obtain a first value, and performing a product operation on the abnormal distance and the weight value corresponding to the abnormal distance to obtain a second value;

and performing summation operation on the first value and the second value to obtain a similarity coefficient of the feature sample data.

6. The method according to any one of claims 1 to 5, wherein the determining that the characteristic data characterizes an anomaly based on the relationship of the deviation rate to a preset deviation threshold comprises:

comparing the deviation rate to the preset deviation threshold;

if the deviation rate is larger than the preset deviation rate threshold value, determining that the characteristic data representation has new abnormity;

and if the deviation rate is smaller than or equal to the preset deviation rate threshold value, determining that the characteristic data representation has no new abnormality.

7. The method according to any one of claims 1 to 5, wherein the storing the feature data into the preset repository comprises:

determining an exception handling scheme corresponding to the new exception characterized by the feature data; wherein the exception handling scheme is to handle the exception to resume normal operation;

and storing the abnormal processing scheme and the characteristic data in the preset storage library in an associated manner.

8. A data processing apparatus, comprising:

the acquisition module is configured to input the characteristic data in the measurement report data into a trained self-encoder model to obtain a reconstruction error value corresponding to the characteristic data;

the comparison module is configured to perform deviation comparison on the feature data and abnormal feature data contained in a preset storage library to obtain a deviation rate of the feature data relative to the abnormal feature data if the reconstruction error value is greater than a preset error threshold;

a determination module configured to determine that the characteristic data characterizes an abnormal condition based on a relationship between the deviation rate and a preset deviation threshold;

and the updating module is configured to store the characteristic data into the preset storage library if the characteristic data represents that a new exception exists.

9. An electronic device, comprising:

a controller;

a memory for storing one or more programs that, when executed by the controller, cause the controller to implement the method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1 to 7.