CN116796075B - Method and device for analyzing problem data - Google Patents

Method and device for analyzing problem data Download PDF

Info

Publication number
CN116796075B
CN116796075B CN202311068926.XA CN202311068926A CN116796075B CN 116796075 B CN116796075 B CN 116796075B CN 202311068926 A CN202311068926 A CN 202311068926A CN 116796075 B CN116796075 B CN 116796075B
Authority
CN
China
Prior art keywords
data
dimension
target user
problem data
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311068926.XA
Other languages
Chinese (zh)
Other versions
CN116796075A (en
Inventor
李红亮
杨磊
张静普
祝力
蔡超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siwei Shijing Technology Beijing Co ltd
Original Assignee
Siwei Shijing Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siwei Shijing Technology Beijing Co ltd filed Critical Siwei Shijing Technology Beijing Co ltd
Priority to CN202311068926.XA priority Critical patent/CN116796075B/en
Publication of CN116796075A publication Critical patent/CN116796075A/en
Application granted granted Critical
Publication of CN116796075B publication Critical patent/CN116796075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification relates to the field of data analysis, and provides a method and a device for analyzing problem data, which are applied to data sharing of a mobile terminal, wherein the method comprises the following steps: selecting at least one data dimension of the target user in all data dimensions as an assumed problem data dimension, wherein data corresponding to the assumed problem data dimension of the target user is assumed problem data; determining similar users according to the data change conditions of the target users corresponding to other data dimensions; according to the data corresponding to the similar user and the target user in all data dimensions, calculating the data corresponding to the target user in the presumed problem data dimension; and analyzing and determining whether the assumed problem data is actual problem data according to the target user and the assumed problem data corresponding to the target user in the dimension of the assumed problem data. According to the embodiment of the specification, the data with problems in the data filled by the user can be analyzed and determined, and the corresponding data can be removed or deleted later, so that the aim of targeted recommendation is fulfilled.

Description

Method and device for analyzing problem data
Technical Field
Embodiments of the present disclosure relate to the field of data analysis, and in particular, to a method and apparatus for analyzing problem data.
Background
With the development of the internet, more and more people register and log in information on application software to receive corresponding services, and generally, when a user logs in the application software, the application software will require to fill in user information so as to conveniently conduct targeted content recommendation on user classification. However, in the process of filling information data, the situation of data filling errors often occurs due to various reasons, so that the real information of the user cannot be determined, and finally, the recommended content does not meet the requirements of the user.
Therefore, an analysis method for problem data is needed, which can be applied to data sharing of a mobile terminal, can analyze and determine data with problems in data filled by a user, and can reject or prune corresponding data later so as to achieve the purpose of targeted recommendation.
Disclosure of Invention
An objective of the embodiments of the present disclosure is to provide a method, an apparatus, a device, and a storage medium for analyzing problem data, so as to be applied to data sharing of a mobile terminal, and to be capable of analyzing data with problems in data filled by a user, and then eliminating or deleting corresponding data, thereby achieving the purpose of targeted recommendation.
In order to achieve the above object, in one aspect, an embodiment of the present disclosure provides a method for analyzing problem data, including:
Selecting at least one data dimension of a target user in all data dimensions as an assumed problem data dimension, wherein data corresponding to the assumed problem data dimension of the target user is assumed problem data;
according to the data change conditions of the target user corresponding to other data dimensions except the assumed problem data dimension, determining similar users, wherein the data corresponding to all the data dimensions of the similar users are actual data of the similar users;
according to the data corresponding to all data dimensions of the similar users and the data corresponding to all data dimensions of the target users, calculating the data corresponding to the assumed problem data dimensions of the target users;
and analyzing and determining whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data according to the data corresponding to the assumed problem data dimension of the target user and the assumed problem data corresponding to the assumed problem data dimension of the target user.
Preferably, the determining, according to the data change situation corresponding to the target user in the other data dimensions except the assumed problem data dimension, the similar user further includes:
Performing discrete processing on the data corresponding to the other data dimensions of the target user to obtain discrete values corresponding to the other data dimensions of the target user;
and determining similar users according to the change condition of the discrete values corresponding to the other data dimensions of the target user.
Preferably, the determining similar users according to the change condition of the discrete values corresponding to the other data dimensions by the target users further comprises:
correlating the discrete values corresponding to the other data dimensions of the target user to obtain a continuous variation trend of the discrete values corresponding to the target user;
and determining the users with consistent continuous variation trend of the discrete values corresponding to the target users as similar users.
Preferably, the calculating the data corresponding to the target user in the assumed problem data dimension according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions further includes:
performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
Performing discrete processing on the data corresponding to each other data dimension of the target user to obtain a discrete value corresponding to each other data dimension of the target user;
according to the discrete values corresponding to the similar user and the target user in other data dimensions, calculating to obtain a first discrete average value corresponding to the similar user and a first discrete average value corresponding to the target user;
calculating a theoretical discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the similar user in the presumed problem data dimension and the first discrete average value corresponding to the similar user and the target user respectively;
and calculating theoretical data corresponding to the target user in the dimension of the presumed problem data according to the theoretical discrete values obtained by different similar users.
Preferably, the calculating the theoretical discrete value corresponding to the target user in the assumed problem data dimension according to the discrete value corresponding to the similar user in the assumed problem data dimension and the first discrete average value corresponding to the similar user and the target user respectively further includes:
calculating a theoretical discrete value corresponding to the target user in the dimension of the assumed problem data according to the following formula:
wherein ,M1 For a target user to assume a theoretical discrete value corresponding to a problem data dimension, A is a discrete value corresponding to a similar user to assume a problem data dimension, B 1 For a first discrete mean value corresponding to a similar user, N 1 And the first discrete average value corresponding to the target user.
Preferably, the analyzing to determine whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data according to the data corresponding to the assumed problem data dimension of the target user and the assumed problem data corresponding to the assumed problem data dimension of the target user further includes:
judging whether the difference between theoretical data corresponding to the dimension of the presumed problem data and the presumed problem data of the target user is within a set difference range;
if yes, determining that the dimension of the assumed problem data is not an actual problem data dimension, and the assumed problem data is not the actual problem data;
if not, determining that the assumed problem data dimension is an actual problem data dimension and the assumed problem data is an actual problem data.
Preferably, the calculating the data corresponding to the target user in the assumed problem data dimension according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions further includes:
Performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
performing discrete processing on the data corresponding to each data dimension by the target user to obtain a discrete value corresponding to each data dimension by the target user;
according to the discrete values corresponding to the similar users and the target users in all data dimensions, calculating to obtain a second discrete average value corresponding to the similar users and a second discrete average value corresponding to the target users;
calculating a predicted discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the target user in the presumed problem data dimension and the second discrete average value corresponding to the similar user and the target user respectively;
and calculating the predicted data corresponding to the target user in the dimension of the presumed problem data according to the predicted discrete values obtained by different similar users.
Preferably, the calculating the predicted discrete value corresponding to the target user in the assumed problem data dimension according to the discrete value corresponding to the target user in the assumed problem data dimension and the second discrete average value corresponding to the similar user and the target user respectively further includes:
Calculating a predicted discrete value corresponding to the target user in the assumed problem data dimension by the following formula:
wherein ,M2 For the target user, assuming the predicted discrete value corresponding to the problem data dimension, B 2 For a second discrete mean value corresponding to a similar user, N 2 For the second discrete mean corresponding to the target user, c=t-N 2 T is a discrete value corresponding to the dimension of the presumed problem data of the target user.
Preferably, the analyzing to determine whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data according to the data corresponding to the assumed problem data dimension of the target user and the assumed problem data corresponding to the assumed problem data dimension of the target user further includes:
judging whether the difference between the predicted data corresponding to the dimension of the presumed problem data and the presumed problem data of the target user is within a set difference range;
if yes, determining that the dimension of the assumed problem data is an actual problem data dimension, and determining that the assumed problem data is an actual problem data;
if not, determining that the assumed problem data dimension is not an actual problem data dimension and that the assumed problem data is not an actual problem data.
In another aspect, embodiments of the present disclosure provide an apparatus for analyzing problem data, the apparatus including:
the selecting module is used for selecting at least one data dimension of all data dimensions of a target user as an assumed problem data dimension, wherein data corresponding to the assumed problem data dimension of the target user is assumed problem data;
the determining module is used for determining similar users according to the data change conditions of the target users corresponding to other data dimensions except the assumed problem data dimensions, wherein the data corresponding to all the data dimensions of the similar users are actual data of the similar users;
the computing module is used for computing the data corresponding to the target user in the presumed problem data dimension according to the data corresponding to the similar user in all the data dimensions and the data corresponding to the target user in all the data dimensions;
and the analysis module is used for analyzing and determining whether the presumed problem data dimension is an actual problem data dimension or not and whether the presumed problem data is an actual problem data or not according to the data corresponding to the presumed problem data dimension of the target user and the presumed problem data corresponding to the presumed problem data dimension of the target user.
The technical scheme provided by the embodiment of the specification can be used in the data sharing of the mobile terminal, which one or more data dimensions are the actual problem data dimensions of the target user is obtained through analysis, and then the corresponding data are determined to be the actual problem data, the corresponding data can be removed or deleted later, and then the content recommendation of the target user is performed by using the removed or deleted data, so that the purpose of targeted recommendation is achieved.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for analyzing problem data according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart of determining similar users according to the data change condition corresponding to the target user in the data dimension except the assumed problem data dimension according to the embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of determining similar users according to the variation of discrete values corresponding to other data dimensions of the target user according to the embodiment of the present disclosure;
FIG. 4 is a schematic flow chart of data corresponding to a target user in a dimension of assumed problem data according to an embodiment of the present disclosure;
fig. 5 is a schematic flow chart of calculating theoretical data corresponding to a target user in a dimension of assumed problem data according to an embodiment of the present disclosure;
FIG. 6 is a schematic flow chart of data corresponding to a target user in a dimension of assumed problem data according to an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart of calculating predicted data corresponding to a target user in a dimension of assumed problem data according to an embodiment of the present disclosure;
fig. 8 is a schematic block diagram showing an analysis apparatus for problem data according to an embodiment of the present disclosure;
fig. 9 shows a schematic structural diagram of a computer device provided in an embodiment of the present specification.
Description of the drawings:
100. selecting a module;
200. a determining module;
300. a computing module;
400. an analysis module;
902. a computer device;
904. a processor;
906. a memory;
908. a driving mechanism;
910. an input/output module;
912. an input device;
914. an output device;
916. a presentation device;
918. a graphical user interface;
920. a network interface;
922. a communication link;
924. a communication bus.
Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the embodiments herein, are intended to be within the scope of the embodiments herein.
With the development of the internet, more and more people register and log in information on application software to receive corresponding services, and generally, when a user logs in the application software, the application software will require to fill in user information so as to conveniently conduct targeted content recommendation on user classification. However, in the process of filling information data, the situation of data filling errors often occurs due to various reasons, so that the real information of the user cannot be determined, and finally, the recommended content does not meet the requirements of the user.
In order to solve the above-mentioned problems, the embodiments of the present specification provide an analysis method of problem data. FIG. 1 is a flow chart of a method of analyzing problem data provided by embodiments of the present specification, which provides the method operational steps as described in the examples or flow charts, but may include more or less operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings.
It should be noted that the terms "first," "second," and the like in the description and the claims of the embodiments of the present specification and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present description described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
Referring to fig. 1, an embodiment of the present specification discloses a method for analyzing problem data, including:
s101: selecting at least one data dimension of a target user in all data dimensions as an assumed problem data dimension, wherein data corresponding to the assumed problem data dimension of the target user is assumed problem data;
s102: according to the data change conditions of the target user corresponding to other data dimensions except the assumed problem data dimension, determining similar users, wherein the data corresponding to all the data dimensions of the similar users are actual data of the similar users;
s103: according to the data corresponding to all data dimensions of the similar users and the data corresponding to all data dimensions of the target users, calculating the data corresponding to the assumed problem data dimensions of the target users;
s104: and analyzing and determining whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data according to the data corresponding to the assumed problem data dimension of the target user and the assumed problem data corresponding to the assumed problem data dimension of the target user.
The actual situation is different in the corresponding data dimension, for example, the data dimension may be: age dimension, academic dimension, through dimension, gender dimension, and the like. It should be noted that, in the process of filling data by a user through a mobile phone, a tablet computer, a notebook computer, a desktop computer and other devices, a problem of filling errors often occurs, a similar gender dimension and a native dimension may be selected by the user by scrolling a scroll bar or clicking a screen, a wrong selection condition often occurs, an age dimension may require the user to input, and an age error input condition occurs during input. In view of the above, analysis of problem data can be performed by the embodiments of the present specification. In the embodiment of the specification, the target user should be the user with the data filling error, and the embodiment of the specification is established that the user identifies which data filling error or errors corresponds to one or more data dimensions when the user has the filling error in the data filling process.
The premise is that at least one data dimension is required to be selected from all data dimensions as an assumed problem data dimension, wherein the judgment includes whether the target user has data filling errors or not, and the selection of the specific assumed problem data dimension can be performed according to an expert system or a trained neural network model, after data corresponding to all dimensions of a certain user are respectively input to the expert system or the neural network model, if any assumed problem data is not recognized, the user is considered to have no filling errors, if at least one assumed problem data is recognized, the user is considered to have filling errors, and the dimension corresponding to the assumed problem data is the assumed problem data dimension.
It should be noted that, whether an expert system or a trained neural network model has an identification error, taking the neural network model as an example, because factors such as configuration of parameters in the neural network model and design of each layer may cause low accuracy of the neural network model, it cannot be ensured that the dimension of the assumed problem data is necessarily the dimension of the real problem data, and it cannot be ensured that the assumed problem data is necessarily the real problem data, so that analysis of the problem data based on the method in the embodiment of the present specification is required.
For example, the target user a is 3 years old, the school is a major study, beijing is complete, sex is a female, at least one data dimension is a presumed problem data dimension, and if two or more data dimensions are obtained, the two or more dimensions need to be analyzed one by one through the steps of the embodiment of the present specification. For example, assuming that the age dimension is an assumed problem data dimension, and the learning dimension, the through dimension, and the sex dimension are other data dimensions, it is further required to determine similar users according to the data change conditions of the target user corresponding to the other data dimensions except the assumed problem data dimension, referring to fig. 2 specifically:
s201: performing discrete processing on the data corresponding to the other data dimensions of the target user to obtain discrete values corresponding to the other data dimensions of the target user;
s202: and determining similar users according to the change condition of the discrete values corresponding to the other data dimensions of the target user.
When the data is subjected to discrete processing, a range corresponding to the data needs to be determined, and a numerical value corresponding to the range is taken as a discrete value, for example, a range corresponding to an academic dimension comprises: the values corresponding to the above ranges are 1, 2, 3 and 4 respectively. For another example, the range corresponding to the gender dimension includes: the numerical values of the above ranges which are sequentially corresponding to men and women are respectively 1 and 2.
Further, referring to fig. 3, according to the change condition of the discrete value corresponding to the other data dimension of the target user, a similar user is determined, specifically:
s301: correlating the discrete values corresponding to the other data dimensions of the target user to obtain a continuous variation trend of the discrete values corresponding to the target user;
s302: and determining the users with consistent continuous variation trend of the discrete values corresponding to the target users as similar users.
Assume that discrete values corresponding to the target user a in other data dimensions are respectively: the academic dimension is 3, the native dimension is 1, the sex dimension is 2, and the discrete values corresponding to other dimensions are correlated to obtain the continuous variation trend of the discrete values as follows: the trend from the academic dimension to the native dimension and then to the gender dimension is descending and ascending. And then selecting the user consistent with the continuous variation trend as the similar user, wherein the data corresponding to all data dimensions of the similar user are the actual data of the similar user, namely, the situation that the similar user has no data error is selected.
In one embodiment of the present disclosure, referring to fig. 4, the calculating, according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions, the data corresponding to the target user in the assumed problem data dimension further includes:
S401: performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
s402: performing discrete processing on the data corresponding to each other data dimension of the target user to obtain a discrete value corresponding to each other data dimension of the target user;
s403: according to the discrete values corresponding to the similar user and the target user in other data dimensions, calculating to obtain a first discrete average value corresponding to the similar user and a first discrete average value corresponding to the target user;
s404: calculating a theoretical discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the similar user in the presumed problem data dimension and the first discrete average value corresponding to the similar user and the target user respectively;
s405: and calculating theoretical data corresponding to the target user in the dimension of the presumed problem data according to the theoretical discrete values obtained by different similar users.
The discrete processing method in S401-S405 is consistent with the discrete processing method in S301-S302, which is not described in detail in the embodiment of the present disclosure, where the first discrete average value in S403 refers to an average value of discrete values corresponding to other data dimensions of a similar user or a target user.
Specifically, the calculating, according to the discrete value corresponding to the similar user in the assumed problem data dimension and the first discrete average value corresponding to the similar user and the target user respectively, the theoretical discrete value corresponding to the target user in the assumed problem data dimension further includes:
calculating a theoretical discrete value corresponding to the target user in the dimension of the assumed problem data according to the following formula:
(1)
wherein ,M1 For a target user, assuming theoretical discrete values corresponding to the problem data dimension, A is the distance corresponding to the problem data dimension by a similar userScattered value, B 1 For a first discrete mean value corresponding to a similar user, N 1 And the first discrete average value corresponding to the target user.
From this equation (1), the calculation of the theoretical discrete value corresponding to the target user in the assumed problem data dimension is based on the user-like situation (including the discrete value corresponding to the assumed problem data dimension and the first discrete mean value), so that for each user-like situation, there is a theoretical discrete value corresponding to it.
Further, referring to fig. 5, theoretical data corresponding to the target user in the dimension of the assumed problem data needs to be calculated by integrating the theoretical discrete values obtained according to different similar users, specifically:
S501: calculating the average number, mode or median of the theoretical discrete values corresponding to the dimension of the assumed problem data of the target user according to the theoretical discrete values obtained by different similar users;
s502: and determining theoretical data corresponding to the target user in the dimension of the assumed problem data according to the average, mode or median of the theoretical discrete values.
For example, 50 similar users have 50 theoretical discrete values obtained according to different similar users, the average, mode or median of the 50 theoretical discrete values needs to be calculated, for example, the average of the theoretical discrete values corresponding to the 50 similar users in the age dimension is 4, the age range corresponding to the discrete value 4 is 30-40 years old, the middle value of the age range can be taken as theoretical data, and of course, other values of the age range can also be taken, and the specification is not limited by the description.
In this embodiment of the present disclosure, the determining, according to the data corresponding to the target user in the assumed problem data dimension and the assumed problem data corresponding to the target user in the assumed problem data dimension, whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data further includes:
Step 1.1: judging whether the difference between theoretical data corresponding to the dimension of the presumed problem data and the presumed problem data of the target user is within a set difference range;
step 1.2: if yes, determining that the dimension of the assumed problem data is not an actual problem data dimension, and the assumed problem data is not the actual problem data;
step 1.3: if not, determining that the assumed problem data dimension is an actual problem data dimension and the assumed problem data is an actual problem data.
The problem data is assumed to be data filled by the user, for example, the filled age is 3 years old, the theoretical data is 35 years old, wherein the set difference range can be set according to the actual situation, the specification is not limited to this, if the set difference range is 0-10 years old, the difference is 35-3=32 years old, and the problem data is assumed to be the actual problem data if the set difference range is not within the set difference range.
In another embodiment of the present disclosure, referring to fig. 6, the calculating, according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions, the data corresponding to the target user in the assumed problem data dimension further includes:
S601: performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
s602: performing discrete processing on the data corresponding to each data dimension by the target user to obtain a discrete value corresponding to each data dimension by the target user;
s603: according to the discrete values corresponding to the similar users and the target users in all data dimensions, calculating to obtain a second discrete average value corresponding to the similar users and a second discrete average value corresponding to the target users;
s604: calculating a predicted discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the target user in the presumed problem data dimension and the second discrete average value corresponding to the similar user and the target user respectively;
s605: and calculating the predicted data corresponding to the target user in the dimension of the presumed problem data according to the predicted discrete values obtained by different similar users.
The discrete processing method in S601-S605 is consistent with the discrete processing method in S301-S302, which is not described in detail in the embodiment of the present disclosure, wherein the second discrete average value in S603 refers to an average value of discrete values corresponding to all data dimensions of a similar user or a target user.
Specifically, the calculating, according to the discrete value corresponding to the target user in the assumed problem data dimension and the second discrete average value corresponding to the similar user and the target user respectively, the predicted discrete value corresponding to the target user in the assumed problem data dimension further includes:
calculating a predicted discrete value corresponding to the target user in the assumed problem data dimension by the following formula:
(2)
wherein ,M2 For the target user, assuming the predicted discrete value corresponding to the problem data dimension, B 2 For a second discrete mean value corresponding to a similar user, N 2 For the second discrete mean corresponding to the target user, c=t-N 2 T is a discrete value corresponding to the dimension of the presumed problem data of the target user.
It should be noted that, because the calculation modes of the formulas are different, the predicted discrete value is calculated by the formula (2), and the theoretical discrete value is calculated by the formula (1), wherein the theoretical discrete value means a discrete value corresponding to the assumed problem data dimension if no error is filled, for example, the age range corresponding to the calculated discrete value is 30-40 years in the above example. And the formula (2) calculates a predicted discrete value, wherein the predicted discrete value means a discrete value corresponding to the assumed problem data if the error is filled in.
Further, referring to fig. 7, the predicted data corresponding to the target user in the dimension of the assumed problem data needs to be calculated by integrating the predicted discrete values obtained according to different similar users, specifically:
s701: calculating the average number, mode or median of the predicted discrete values corresponding to the dimension of the assumed problem data of the target user according to the predicted discrete values obtained by different similar users;
s702: and determining the predicted data corresponding to the target user in the dimension of the presumed problem data according to the average number, the mode number or the median of the predicted discrete values.
For example, 50 similar users have 50 predicted discrete values obtained according to different similar users, the average, mode or median of the 50 predicted discrete values needs to be calculated, for example, the average of the predicted discrete values corresponding to the 50 similar users in the age dimension is 1, the age range corresponding to the discrete value 1 is 0-10 years old, the middle value 5 years old of the age range can be taken as the predicted data, and of course, other values of the age range can also be taken, and the specification is not limited.
In this embodiment of the present disclosure, the determining, according to the data corresponding to the target user in the assumed problem data dimension and the assumed problem data corresponding to the target user in the assumed problem data dimension, whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data further includes:
Step 2.1: judging whether the difference between the predicted data corresponding to the dimension of the presumed problem data and the presumed problem data of the target user is within a set difference range;
step 2.2: if yes, determining that the dimension of the assumed problem data is an actual problem data dimension, and determining that the assumed problem data is an actual problem data;
step 2.3: if not, determining that the assumed problem data dimension is not an actual problem data dimension and that the assumed problem data is not an actual problem data.
The problem data is assumed to be data filled by the user, for example, the filled age is 3 years old, the predicted data is 5 years old, wherein the set difference range can be set according to the actual situation, the specification is not limited to this, if the set difference range is 0-10 years old, the difference is 5-3=2 years old, and the problem data is assumed to be the actual problem data within the set difference range.
The method of the embodiment of the specification can be applied to data sharing of the mobile terminal, which one or more data dimensions are the actual problem data dimensions of the target user is obtained through analysis, and then the corresponding data are determined to be the actual problem data, and then the corresponding data can be removed or deleted, and further the content recommendation of the target user is carried out by utilizing the removed or deleted data, so that the purpose of targeted recommendation is achieved.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party. In addition, the technical scheme described by the embodiment of the application accords with the relevant regulations of national laws and regulations for acquiring, storing, using, processing and the like of the data.
Based on the above-mentioned analysis method for problem data, the embodiment of the present disclosure further provides an analysis device for problem data correspondingly. The apparatus may include a system (including a distributed system), software (applications), modules, components, servers, clients, etc. that employ the methods described in the embodiments of the present specification in combination with the necessary apparatus to implement the hardware. Based on the same innovative concepts, the embodiments of the present description provide means in one or more embodiments as described in the following embodiments. Because the implementation scheme and the method for solving the problem by the device are similar, the implementation of the device in the embodiment of the present disclosure may refer to the implementation of the foregoing method, and the repetition is not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Specifically, fig. 8 is a schematic block diagram of an embodiment of an analysis device for problem data according to an embodiment of the present disclosure, and referring to fig. 8, the analysis device for problem data according to an embodiment of the present disclosure includes: the system comprises a selection module 100, a determination module 200, a calculation module 300 and an analysis module 400.
The selecting module 100 is configured to select at least one data dimension of all data dimensions of a target user as a presumed problem data dimension, where data corresponding to the presumed problem data dimension of the target user is presumed problem data;
the determining module 200 is configured to determine, according to a data change condition of the target user corresponding to data dimensions other than the assumed problem data dimensions, similar users, where data corresponding to all data dimensions of the similar users are actual data of the similar users;
the calculating module 300 is configured to calculate data corresponding to the target user in the assumed problem data dimension according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions;
the analysis module 400 is configured to analyze and determine whether the assumed problem data dimension is an actual problem data dimension and whether the assumed problem data is actual problem data according to data corresponding to the assumed problem data dimension of the target user and the assumed problem data corresponding to the assumed problem data dimension of the target user.
Referring to fig. 9, a computer device 902 is further provided in an embodiment of the present disclosure based on the above-described method for analyzing problem data, where the above-described method is executed on the computer device 902. The computer device 902 may include one or more processors 904, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 902 may further comprise any memory 906 for storing any kind of information, such as code, settings, data, etc., and in a specific embodiment, a computer program on the memory 906 and executable on the processor 904, which computer program, when being executed by the processor 904, may execute instructions according to the method described above. For example, and without limitation, the memory 906 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may store information using any technique. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 902. In one case, when the processor 904 executes associated instructions stored in any memory or combination of memories, the computer device 902 can perform any of the operations of the associated instructions. The computer device 902 also includes one or more drive mechanisms 908 for interacting with any memory, such as a hard disk drive mechanism, optical disk drive mechanism, and the like.
The computer device 902 may also include an input/output module 910 (I/O) for receiving various inputs (via an input device 912) and for providing various outputs (via an output device 914). One particular output mechanism may include a presentation device 916 and an associated graphical user interface 918 (GUI). In other embodiments, input/output module 910 (I/O), input device 912, and output device 914 may not be included, but merely as a computer device in a network. The computer device 902 may also include one or more network interfaces 920 for exchanging data with other devices via one or more communication links 922. One or more communication buses 924 couple the above-described components together.
The communication link 922 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication link 922 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
Corresponding to the method in fig. 1-7, the present description also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
The present description also provides computer-readable instructions, wherein the program therein causes the processor to perform the method as shown in fig. 1 to 7 when the processor executes the instructions.
It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation of the embodiments of the present disclosure.
It should also be understood that, in the embodiments of the present specification, the term "and/or" is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In the embodiment of the present specification, the character "/", generally indicates that the front and rear associated objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the various illustrative elements and steps have been described above generally in terms of function in order to best explain the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this specification, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present description.
In addition, each functional unit in each embodiment of the present specification may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present specification are essential or contribute to the prior art, or all or part of the technical solutions may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present specification. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Specific embodiments are applied in the present specification to illustrate the principles and implementations of the embodiments of the present specification, and the description of the above embodiments is only used to help understand the methods of the embodiments of the present specification and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope based on the ideas of the embodiments of the present specification, the contents of the present specification should not be construed as limiting the embodiments of the present specification in view of the above.

Claims (9)

1. A method of analyzing problem data, comprising:
selecting at least one data dimension of a target user in all data dimensions as an assumed problem data dimension, wherein the data corresponding to the assumed problem data dimension of the target user is assumed problem data, and the problem data dimension means that the data corresponding to the dimension is filled with errors;
according to the data change conditions of the target user corresponding to other data dimensions except the assumed problem data dimension, determining similar users, wherein the data corresponding to all the data dimensions of the similar users are actual data of the similar users;
according to the data corresponding to all data dimensions of the similar users and the data corresponding to all data dimensions of the target users, calculating the data corresponding to the assumed problem data dimensions of the target users;
According to the data corresponding to the target user in the presumed problem data dimension and the presumed problem data corresponding to the target user in the presumed problem data dimension, analyzing and determining whether the presumed problem data dimension is an actual problem data dimension and whether the presumed problem data is an actual problem data;
according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions, calculating the data corresponding to the target user in the assumed problem data dimensions comprises:
performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
performing discrete processing on the data corresponding to each other data dimension of the target user to obtain a discrete value corresponding to each other data dimension of the target user;
according to the discrete values corresponding to the similar user and the target user in other data dimensions, calculating to obtain a first discrete average value corresponding to the similar user and a first discrete average value corresponding to the target user;
calculating a theoretical discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the similar user in the presumed problem data dimension and the first discrete average value corresponding to the similar user and the target user respectively;
And calculating theoretical data corresponding to the target user in the dimension of the presumed problem data according to the theoretical discrete values obtained by different similar users.
2. The method according to claim 1, wherein the determining similar users according to the data change situation corresponding to the target user in the other data dimensions than the assumed problem data dimension further comprises:
performing discrete processing on the data corresponding to the other data dimensions of the target user to obtain discrete values corresponding to the other data dimensions of the target user;
and determining similar users according to the change condition of the discrete values corresponding to the other data dimensions of the target user.
3. The method according to claim 1, wherein determining similar users based on the change of the discrete values corresponding to the other data dimensions by the target user further comprises:
correlating the discrete values corresponding to the other data dimensions of the target user to obtain a continuous variation trend of the discrete values corresponding to the target user;
and determining the users with consistent continuous variation trend of the discrete values corresponding to the target users as similar users.
4. The method according to claim 1, wherein calculating the theoretical discrete value corresponding to the target user in the assumed problem data dimension according to the discrete value corresponding to the similar user in the assumed problem data dimension and the first discrete average value corresponding to the similar user and the target user, respectively, further comprises:
calculating a theoretical discrete value corresponding to the target user in the dimension of the assumed problem data according to the following formula:
wherein ,theoretical discrete values corresponding to the dimension of the presumed question data for the target user, +.>Discrete values corresponding to the dimension of the assumed question data for a similar user +.>For a first discrete mean value corresponding to a similar user, < >>And the first discrete average value corresponding to the target user.
5. The method of claim 4, wherein the analyzing to determine whether the presumed problem data dimension is an actual problem data dimension and whether the presumed problem data is an actual problem data based on the data corresponding to the presumed problem data dimension by the target user and the presumed problem data corresponding to the presumed problem data dimension by the target user further comprises:
Judging whether the difference between theoretical data corresponding to the dimension of the presumed problem data and the presumed problem data of the target user is within a set difference range;
if yes, determining that the dimension of the assumed problem data is not an actual problem data dimension, and the assumed problem data is not the actual problem data;
if not, determining that the assumed problem data dimension is an actual problem data dimension and the assumed problem data is an actual problem data.
6. The method according to claim 1, wherein calculating the data corresponding to the target user in the assumed problem data dimension based on the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions further comprises:
performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
performing discrete processing on the data corresponding to each data dimension by the target user to obtain a discrete value corresponding to each data dimension by the target user;
according to the discrete values corresponding to the similar users and the target users in all data dimensions, calculating to obtain a second discrete average value corresponding to the similar users and a second discrete average value corresponding to the target users;
Calculating a predicted discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the target user in the presumed problem data dimension and the second discrete average value corresponding to the similar user and the target user respectively;
and calculating the predicted data corresponding to the target user in the dimension of the presumed problem data according to the predicted discrete values obtained by different similar users.
7. The method according to claim 6, wherein calculating the predicted discrete value corresponding to the target user in the assumed problem data dimension according to the discrete value corresponding to the target user in the assumed problem data dimension and the second discrete average value corresponding to the similar user and the target user, respectively, further comprises:
calculating a predicted discrete value corresponding to the target user in the assumed problem data dimension by the following formula:
wherein ,predicted discrete values corresponding to the dimension of the hypothetical problem data for the target user, +.>For a second discrete mean value corresponding to a similar user, < >>For the second discrete mean value corresponding to the target user, < >>,/>Discrete values corresponding to the dimension of the presumed problem data for the target user.
8. The method of claim 7, wherein the analyzing to determine whether the presumed problem data dimension is an actual problem data dimension and whether the presumed problem data is an actual problem data based on the data corresponding to the presumed problem data dimension by the target user and the presumed problem data corresponding to the presumed problem data dimension by the target user further comprises:
judging whether the difference between the predicted data corresponding to the dimension of the presumed problem data and the presumed problem data of the target user is within a set difference range;
if yes, determining that the dimension of the assumed problem data is an actual problem data dimension, and determining that the assumed problem data is an actual problem data;
if not, determining that the assumed problem data dimension is not an actual problem data dimension and that the assumed problem data is not an actual problem data.
9. An apparatus for analyzing problem data, the apparatus comprising:
the system comprises a selection module, a storage module and a storage module, wherein the selection module is used for selecting at least one data dimension of all data dimensions of a target user as an assumed problem data dimension, and data corresponding to the assumed problem data dimension of the target user is assumed problem data, wherein the problem data dimension refers to that the data corresponding to the dimension is filled with errors;
The determining module is used for determining similar users according to the data change conditions of the target users corresponding to other data dimensions except the assumed problem data dimensions, wherein the data corresponding to all the data dimensions of the similar users are actual data of the similar users;
the computing module is used for computing the data corresponding to the target user in the presumed problem data dimension according to the data corresponding to the similar user in all the data dimensions and the data corresponding to the target user in all the data dimensions;
the analysis module is used for analyzing and determining whether the presumed problem data dimension is an actual problem data dimension or not and whether the presumed problem data is an actual problem data or not according to the data corresponding to the presumed problem data dimension of the target user and the presumed problem data corresponding to the presumed problem data dimension of the target user;
according to the data corresponding to the similar user in all data dimensions and the data corresponding to the target user in all data dimensions, calculating the data corresponding to the target user in the assumed problem data dimensions comprises:
performing discrete processing on the data corresponding to each data dimension by the similar user to obtain a discrete value corresponding to each data dimension by the similar user;
Performing discrete processing on the data corresponding to each other data dimension of the target user to obtain a discrete value corresponding to each other data dimension of the target user;
according to the discrete values corresponding to the similar user and the target user in other data dimensions, calculating to obtain a first discrete average value corresponding to the similar user and a first discrete average value corresponding to the target user;
calculating a theoretical discrete value corresponding to the target user in the presumed problem data dimension according to the discrete value corresponding to the similar user in the presumed problem data dimension and the first discrete average value corresponding to the similar user and the target user respectively;
and calculating theoretical data corresponding to the target user in the dimension of the presumed problem data according to the theoretical discrete values obtained by different similar users.
CN202311068926.XA 2023-08-24 2023-08-24 Method and device for analyzing problem data Active CN116796075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311068926.XA CN116796075B (en) 2023-08-24 2023-08-24 Method and device for analyzing problem data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311068926.XA CN116796075B (en) 2023-08-24 2023-08-24 Method and device for analyzing problem data

Publications (2)

Publication Number Publication Date
CN116796075A CN116796075A (en) 2023-09-22
CN116796075B true CN116796075B (en) 2023-10-31

Family

ID=88048376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311068926.XA Active CN116796075B (en) 2023-08-24 2023-08-24 Method and device for analyzing problem data

Country Status (1)

Country Link
CN (1) CN116796075B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480187A (en) * 2017-07-10 2017-12-15 北京京东尚科信息技术有限公司 User's value category method and apparatus based on cluster analysis
CN108509626A (en) * 2018-04-08 2018-09-07 百度在线网络技术(北京)有限公司 Method and apparatus for verify data
CN112069833A (en) * 2020-09-01 2020-12-11 北京声智科技有限公司 Log analysis method, log analysis device and electronic equipment
CN112506897A (en) * 2020-11-17 2021-03-16 贵州电网有限责任公司 Method and system for analyzing and positioning data quality problem
CN115269677A (en) * 2022-06-24 2022-11-01 天翼数字生活科技有限公司 Multi-dimensional data analysis method, device, equipment and computer program product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11151327B2 (en) * 2019-02-28 2021-10-19 Atlassian Pty Ltd. Autonomous detection of compound issue requests in an issue tracking system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480187A (en) * 2017-07-10 2017-12-15 北京京东尚科信息技术有限公司 User's value category method and apparatus based on cluster analysis
CN108509626A (en) * 2018-04-08 2018-09-07 百度在线网络技术(北京)有限公司 Method and apparatus for verify data
CN112069833A (en) * 2020-09-01 2020-12-11 北京声智科技有限公司 Log analysis method, log analysis device and electronic equipment
CN112506897A (en) * 2020-11-17 2021-03-16 贵州电网有限责任公司 Method and system for analyzing and positioning data quality problem
CN115269677A (en) * 2022-06-24 2022-11-01 天翼数字生活科技有限公司 Multi-dimensional data analysis method, device, equipment and computer program product

Also Published As

Publication number Publication date
CN116796075A (en) 2023-09-22

Similar Documents

Publication Publication Date Title
US11935142B2 (en) Systems and methods for correlating experimental biological datasets
Montserrat-Adell et al. Consensus, dissension and precision in group decision making by means of an algebraic extension of hesitant fuzzy linguistic term sets
WO2022179138A1 (en) Image processing method and apparatus, and computer device and storage medium
CN105260782A (en) Method and device for processing reserved registration information
WO2013003961A2 (en) System and method for determining interpersonal relationship influence information using textual content from interpersonal interactions
CN111815169B (en) Service approval parameter configuration method and device
EP3142050A1 (en) Predicting attribute values for user segmentation
CN112860997A (en) Medical resource recommendation method, device, equipment and storage medium
WO2015077316A1 (en) System and method for facilitating communication between a web application and a local peripheral device through a native service
CN109905257B (en) Method and device for determining bandwidth transmission capacity
CN104657406B (en) The method and system of interactive segmentation for the entry in social Collaborative environment
CN115273170A (en) Image clustering method, device, equipment and computer readable storage medium
CN116796075B (en) Method and device for analyzing problem data
Aldughayfiq et al. A framework to lower the risk of medication prescribing and dispensing errors: A usability study of an NFC-based mobile application
CN117171757A (en) Model construction method for software vulnerability discovery and software vulnerability discovery method
CN113762421B (en) Classification model training method, flow analysis method, device and equipment
CN110309691B (en) Face recognition method, face recognition device, server and storage medium
CN115774707A (en) Object attribute based data processing method and device, electronic equipment and storage medium
CN110910108A (en) Data association method and device, electronic equipment and storage medium
CN113780666B (en) Missing value prediction method and device and readable storage medium
CN115114032A (en) Message checking method, device, equipment, storage medium and product
CN112286703B (en) User classification method and device, client device and readable storage medium
KR20190077093A (en) CNA-induced care to improve clinical outcomes and reduce total care costs
CN110456976B (en) Method and device for processing inspection sheet, storage medium and electronic device
CN111506826A (en) User recommendation method, device, equipment and storage medium based on intimacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant