CN111797148A - Data processing method, data processing device, storage medium and electronic equipment - Google Patents

Data processing method, data processing device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111797148A
CN111797148A CN201910282170.6A CN201910282170A CN111797148A CN 111797148 A CN111797148 A CN 111797148A CN 201910282170 A CN201910282170 A CN 201910282170A CN 111797148 A CN111797148 A CN 111797148A
Authority
CN
China
Prior art keywords
data
preset
preset scene
type
types
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910282170.6A
Other languages
Chinese (zh)
Inventor
何明
陈仲铭
杨统
刘耀勇
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201910282170.6A priority Critical patent/CN111797148A/en
Publication of CN111797148A publication Critical patent/CN111797148A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data processing method, a data processing device, a storage medium and electronic equipment, wherein the application embodiment can acquire terminal use data corresponding to a preset scene type, the terminal use data comprises a plurality of data types, the association degree between the data types and the preset scene type is calculated according to a preset regression algorithm and the terminal use data, and then sampling parameters corresponding to the preset scene type are adjusted according to the association degree. By the scheme, the targeted sampling parameters can be obtained for different scene categories, and then when data of specific scene categories are analyzed, the data can be collected according to the sampling parameters corresponding to the data, so that the collected data scale can be reduced, and targeted data collection can be realized.

Description

Data processing method, data processing device, storage medium and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a storage medium, and an electronic device.
Background
As a terminal manufacturer, it is often necessary to analyze behavior habits and states of a terminal user in a specific scene, and then data of the terminal user needs to be collected for analysis. For example, to analyze the regularity and trend of user data in a conference scene, it is necessary to acquire usage data of a user terminal in the conference scene and extract features from the usage data for analysis.
However, in the feature extraction correlation scheme, data is often acquired according to the same sampling parameters for all scene types, for example, all types of terminal usage data are acquired according to the same sampling frequency for all scene types. However, in practical applications, the types of data used by users are different under different scene categories. Meanwhile, different scene types require different data acquisition frequencies even though the terminals use the same type of data. For example, in both a running scene and an office scene, the terminal needs to use GPS (Global Positioning System) data, but when analyzing data in both scenes, the sampling frequency and the sampling accuracy required for the GPS data are different. According to the existing data acquisition scheme, the acquired data is large in scale, and the acquired data has no pertinence, so that the accuracy of an analysis conclusion is low.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, a storage medium and electronic equipment, which can reduce the scale of acquired data and realize targeted data acquisition.
In a first aspect, an embodiment of the present application provides a data processing method, including:
acquiring terminal use data corresponding to a preset scene type, wherein the terminal use data comprises a plurality of data types;
calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data;
and adjusting the sampling parameters of the preset scene categories corresponding to the multiple data types according to the association degrees between the multiple data types and the preset scene categories.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
the data acquisition module is used for acquiring terminal use data corresponding to a preset scene type, wherein the terminal use data comprises a plurality of data types;
the association degree calculation module is used for calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data;
and the parameter adjusting module is used for adjusting the sampling parameters corresponding to the preset scene categories according to the association degrees between the multiple data types and the preset scene categories.
In a third aspect, a storage medium is provided in this application, and a computer program is stored thereon, and when the computer program runs on a computer, the computer is caused to execute the data processing method provided in any embodiment of this application.
In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory has a computer program, and the processor is configured to execute the data processing method provided in any embodiment of the present application by calling the computer program.
According to the technical scheme, the terminal use data corresponding to the preset scene type can be obtained, the terminal use data comprise a plurality of data types, the association degree between the data types and the preset scene type is calculated according to the preset regression algorithm and the terminal use data, then the sampling parameters corresponding to the preset scene type are adjusted according to the association degree, and the sampling parameters of the data types under the preset scene are determined according to the association degree by analyzing the association degree between the data types and the scene type. By the scheme, targeted sampling parameters can be obtained for different scene types, and further, when data of specific scene types are analyzed, the data can be collected according to the corresponding sampling parameters, so that the collected data scale can be reduced, and targeted data collection can be realized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a panoramic sensing architecture of a data processing method according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of a first data processing method according to an embodiment of the present disclosure.
Fig. 3 is a schematic flowchart of a second data processing method according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.
Referring to fig. 1, fig. 1 is a schematic view of a panoramic sensing architecture of a data processing method according to an embodiment of the present application. The data processing method is applied to the electronic equipment. A panoramic perception framework is arranged in the electronic equipment. The panoramic sensing architecture is an integration of hardware and software for implementing the data processing method in an electronic device.
The panoramic perception architecture comprises an information perception layer, a data processing layer, a feature extraction layer, a scene modeling layer and an intelligent service layer.
The information perception layer is used for acquiring information of the electronic equipment or information in an external environment. The information-perceiving layer may include a plurality of sensors. For example, the information sensing layer includes a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, a barometer, and a heart rate sensor.
Among other things, a distance sensor may be used to detect a distance between the electronic device and an external object. The magnetic field sensor may be used to detect magnetic field information of the environment in which the electronic device is located. The light sensor can be used for detecting light information of the environment where the electronic equipment is located. The acceleration sensor may be used to detect acceleration data of the electronic device. The fingerprint sensor may be used to collect fingerprint information of a user. The Hall sensor is a magnetic field sensor manufactured according to the Hall effect, and can be used for realizing automatic control of electronic equipment. The location sensor may be used to detect the geographic location where the electronic device is currently located. Gyroscopes may be used to detect angular velocity of an electronic device in various directions. Inertial sensors may be used to detect motion data of an electronic device. The gesture sensor may be used to sense gesture information of the electronic device. A barometer may be used to detect the barometric pressure of the environment in which the electronic device is located. The heart rate sensor may be used to detect heart rate information of the user.
And the data processing layer is used for processing the data acquired by the information perception layer. For example, the data processing layer may perform data cleaning, data integration, data transformation, data reduction, and the like on the data acquired by the information sensing layer.
The data cleaning refers to cleaning a large amount of data acquired by the information sensing layer to remove invalid data and repeated data. The data integration refers to integrating a plurality of single-dimensional data acquired by the information perception layer into a higher or more abstract dimension so as to comprehensively process the data of the plurality of single dimensions. The data transformation refers to performing data type conversion or format conversion on the data acquired by the information sensing layer so that the transformed data can meet the processing requirement. The data reduction means that the data volume is reduced to the maximum extent on the premise of keeping the original appearance of the data as much as possible.
The characteristic extraction layer is used for extracting characteristics of the data processed by the data processing layer so as to extract the characteristics included in the data. The extracted features may reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located, etc.
The feature extraction layer may extract features or process the extracted features by a method such as a filtering method, a packing method, or an integration method.
The filtering method is to filter the extracted features to remove redundant feature data. Packaging methods are used to screen the extracted features. The integration method is to integrate a plurality of feature extraction methods together to construct a more efficient and more accurate feature extraction method for extracting features.
The scene modeling layer is used for building a model according to the features extracted by the feature extraction layer, and the obtained model can be used for representing the state of the electronic equipment, the state of a user, the environment state and the like. For example, the scenario modeling layer may construct a key value model, a pattern identification model, a graph model, an entity relation model, an object-oriented model, and the like according to the features extracted by the feature extraction layer.
The intelligent service layer is used for providing intelligent services for the user according to the model constructed by the scene modeling layer. For example, the intelligent service layer can provide basic application services for users, perform system intelligent optimization for electronic equipment, and provide personalized intelligent services for users.
In addition, the panoramic perception architecture can further comprise a plurality of algorithms, each algorithm can be used for analyzing and processing data, and the plurality of algorithms can form an algorithm library. For example, the algorithm library may include algorithms such as markov algorithm, hidden dirichlet distribution algorithm, bayesian classification algorithm, support vector machine, K-means clustering algorithm, K-nearest neighbor algorithm, conditional random field, residual network, long-short term memory network, convolutional neural network, cyclic neural network, and the like.
Based on the panoramic sensing framework, the electronic equipment acquires terminal use data of a target user through an information sensing layer and/or other modes. The data processing layer processes the terminal use data, for example, performs data cleaning, data integration, and the like on the acquired terminal use data. Next, the feature extraction layer or the data processing layer adjusts sampling parameters of the terminal usage data according to the feature extraction scheme provided in the embodiment of the present application. For example, terminal use data corresponding to a preset scene category is obtained, the terminal use data comprises a plurality of data types, association degrees between the data types and the preset scene category are calculated according to a preset regression algorithm and the terminal use data, and then sampling parameters corresponding to the preset scene category are adjusted according to the association degrees. By the scheme, targeted sampling parameters can be obtained for different scene types, and further, when data of specific scene types are analyzed, the data can be collected according to the corresponding sampling parameters, so that the collected data scale can be reduced, and targeted data collection can be realized.
An execution main body of the data processing method may be the data processing apparatus provided in the embodiment of the present application, or an electronic device integrated with the data processing apparatus, where the data processing apparatus may be implemented in a hardware or software manner. The electronic device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.
Referring to fig. 2, fig. 2 is a first flowchart illustrating a data processing method according to an embodiment of the present disclosure. The specific flow of the data processing method provided by the embodiment of the application can be as follows:
step 101, obtaining terminal use data corresponding to a preset scene type, wherein the terminal use data comprises a plurality of data types.
In the embodiment of the application, the terminal use data is generated according to the use condition of a user on the intelligent terminal, such as electronic equipment.
For example, the terminal usage data mainly includes the following three major categories: environmental data, user behavior data, and terminal operational data. Wherein each large category of data comprises a plurality of data types, for example, the environmental data comprises data types of temperature, illumination, and the like; the user behavior data may include: the time, place, and frequency of opening the application; the terminal operation data may include: the operation state of the terminal, such as the on-off state of the mobile data network, the connection state of the wireless hotspot, the identity information of the connected wireless hotspot, the currently running application program, the previous foreground application program, the time for the current application program to stay in the background, the time for the current application program to be switched to the background last time, the plugging and unplugging state of the earphone jack, the charging state, the battery power information, the screen display time and other data types.
The terminal usage data may further include data collected by sensors integrated in the terminal, such as a combination of one or more of a motion sensor, a light sensor, a temperature sensor, and a humidity sensor, among others.
And storing the acquired terminal use data into a preset database, for example, constructing the terminal use database by adopting MySQL in advance.
Storing the obtained terminal usage data in the form of data pairs, for example, as<ci,d(ci)>Wherein c isiRepresents the ith scene of the total scenes, d (c)i) Represents the set of terminal usage data used in the ith scenario, where i ∈ (1, n). The preset scene category needs to be divided in advance, for example, the preset scene category can be divided manually, and the scene of the terminal used by the user is divided into a conference, a meal ordering, a game, a running, a shopping and the like. Assume that a total of n scene classes are set.
Wherein d (c) can be expressed in the form of tensori). For example, d (c) is expressed in the form of a second-order tensor, i.e., a matrixi) One row of the matrix corresponds to one data type, and each row may be a row vector corresponding to one data type. For example, the terminal acquires and records data according to a preset frequency for each type of terminal usage data, and the data acquired in a scene of a specific category may be configured into a data sequence, which may be represented as a long vector. For example, in a running scene, the terminal records GPS data, and assuming that the GPS data is acquired at a frequency of 1 time per second, the user can acquire 600 pieces of GPS data after running for ten minutes, and these pieces of GPS data may form a long vector. Alternatively, d (c) is expressed in the form of a first order tensor, i.e., a vectori) Each element in the vector corresponds to a type of terminal usage data.
Next, in order to determine the degree of association between each data type and a specific type of scene in the terminal usage data of all types in a specific type of scene, terminal usage data of all data types collected in the specific scene is first acquired and stored in the MySQL database.
Optionally, in some embodiments, after the terminal usage data is acquired, the terminal usage number of each type may be normalized. Because the dimension between the use data of each type of terminal is different, the influence of the dimension on the data can be eliminated after the data is subjected to the standardization processing, and the accuracy of the regression analysis can be improved when the data regression analysis is subsequently performed.
Specifically, in an optional implementation manner, before the step of calculating association degrees between the plurality of data types and the preset scene categories according to a preset regression algorithm and the terminal usage data in step 102, the method further includes:
carrying out standardization processing on the terminal use data;
the step of calculating the association degrees between the plurality of data types and the preset scene categories according to a preset regression algorithm and the terminal usage data comprises: and calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data after the standardization processing.
For example, a normalization processing method such as dispersion normalization or Z-score normalization can be used. Dispersion normalization is a linear transformation of the original data, with the resulting values mapped between [0, 1 ]. The Z-score standardization is to standardize the data according to the mean value and standard deviation of the original data, the processed data conforms to the standard normal distribution, namely the mean value is 0, the standard deviation is 1, and the processed data is mapped between [ -1, 1 ]. The data after the standardization processing are dimensionless data, all indexes are in the same order of magnitude, and the accuracy of the regression analysis can be improved by using the data after the standardization processing when the regression analysis is subsequently performed.
102, calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data.
In the embodiment of the present application, the preset regression algorithm may be a linear regression algorithm, a multiple regression algorithm, a ridge regression algorithm, a lasso (least absolute value convergence and selection operator, lasso algorithm) regression algorithm, and the like.
Referring to fig. 3, fig. 3 is a schematic flowchart of a second data processing method according to an embodiment of the present application. In some embodiments, the preset regression algorithm is a linear regression algorithm, and the step 102 of calculating the association degrees between the plurality of data types and the preset scene categories according to the preset regression algorithm and the terminal usage data includes:
step 1021, performing regression analysis on the terminal use data of the preset scene type and the multiple data types according to linear regression to obtain a weight vector;
step 1022, determining the association degrees between the multiple data types and the preset scene categories according to the weight vector.
Specifically, a linear regression equation is constructed by taking the terminal use data of the multiple data types as independent variables and taking the scene category as dependent variables; and solving the linear regression equation to obtain a weight vector. For example, a linear regression equation is solved according to a random gradient ascent algorithm to obtain a weight vector.
The linear regression equation can be expressed as y-wx.
C is toiWhen the value of y, i.e., the dependent variable, is expressed as d (c)i) When considering the value of x, an argument, where w is a vector, it can be expressed as w ═ w (w)1,w2,…,wj,…,wm)。
Assuming a total of m data types, wjIs the weight of the jth data type in the m data types. And solving the regression equation through a random gradient ascent algorithm to obtain a weight vector w, and obtaining the association degree between each data type and the preset scene type according to the weight vector.
The preset scene category can be expressed by numbers in a manner of establishing an index number, for example, the index number of a conference is 1, the index number of a meal ordering is 2, the index number of a game is 3, the index number of a running is 4, and the like.
Step 103, adjusting sampling parameters of the preset scene categories corresponding to the plurality of data types according to the association degrees between the plurality of data types and the preset scene categories.
And after the association degree between each data type under the preset scene type and the preset scene type is obtained, setting the sampling parameters. The sampling parameters may include sampling frequency and sampling precision, the sampling precision refers to what number of bits is accurate, for example, some data have a low degree of importance and only need to acquire an integer number of bits, while some types of data have a high degree of importance and need to acquire high precision, and then may need to be accurate to 5 bits after a decimal point.
For example, if the degree of association between a data type and a preset scene type is greater, the degree of importance of the data type is determined to be higher, so that a greater sampling frequency and sampling accuracy can be set for the data type. Otherwise, a smaller sampling frequency and sampling precision can be set for the data type. If the importance of the GPS data is high in the travel scene, the sampling frequency and the sampling accuracy of the GPS data need to be improved.
The data type can be further screened according to the degree of association, for example, if the degree of association between the data type and the preset scene type is smaller than the preset threshold, the data type is deleted from the data type to be acquired corresponding to the preset scene type; if the association degree between the data type and the preset scene type is not smaller than the preset threshold, adjusting the sampling frequency and the sampling precision corresponding to the data type according to the association degree between the data type and the preset scene type, wherein the sampling frequency and the sampling precision are in direct proportion to the association degree.
According to the scheme of the embodiment of the application, because the behavior habits of each user are different, even under the same scene, the terminal use data generated by the terminal are different. When the terminal use data of the target user in the target scene is acquired for the first time, all types of terminal use data can be acquired according to the default sampling parameters. And then analyzing all the acquired terminal use data according to the scheme of the embodiment of the application, and determining sampling parameters matched with the target user, wherein the sampling parameters determine the sampling frequency and the sampling precision of each data type of the target user in the target scene. When the behavior of the target in the scene is analyzed again in the following, the terminal use data can be obtained in a targeted manner according to the updated sampling parameters, and the characteristics are extracted for analysis. It is not necessary to pull all the terminal usage data. For example, the target scene is a conference scene, the target user is a user a, and the user data rule and trend of the user a in the conference scene are to be analyzed. After the sampling parameters of the user A in the conference scene are determined through the scheme of the application, the user A can know the accuracy and frequency of the sampling parameters to obtain the terminal use data of the user A in the conference scene, and the relevant data and the irrelevant data do not need to be pulled back. On one hand, the pertinence of the acquired data can be improved, and on the other hand, the scale of the pulled-back data can be reduced to a certain extent without influencing the final analysis conclusion.
After the sampling parameters corresponding to each preset scene category are acquired, the sampling parameters can be stored in a MySQL database according to a data pair mode. For example, store as<ci,Z(ci)>Wherein, Z (c)i) As scene ciCorresponding sampling parameters of each data type. By the method, after the target scene to be analyzed is determined, the sampling parameters corresponding to the target scene are searched from the database, the required terminal use data are collected according to the searched sampling parameters, the data features are extracted from the obtained data according to the preset feature extraction mode, and the more targeted and differentiated panoramic frequency domain features under each scene can be obtained.
In particular implementation, the present application is not limited by the execution sequence of the described steps, and some steps may be performed in other sequences or simultaneously without conflict.
According to the scheme, the sampling parameters of the data types under the preset scene are determined according to the association degree analysis between the data types and the scene types. By the scheme, targeted sampling parameters can be obtained for different scene types, and further, when data of specific scene types are analyzed, the data can be collected according to the corresponding sampling parameters, so that the collected data scale can be reduced, and targeted data collection can be realized.
In one embodiment, a data processing apparatus is also provided. Referring to fig. 4, fig. 4 is a schematic structural diagram of a data processing apparatus 400 according to an embodiment of the present disclosure. The data processing apparatus 400 is applied to an electronic device, and the data processing apparatus 400 includes a data obtaining module 401, an association degree calculating module 402, and a parameter adjusting module 403, as follows:
the data obtaining module 401 is configured to obtain terminal usage data corresponding to a preset scene category, where the terminal usage data includes multiple data types.
In the embodiment of the application, the terminal use data is generated according to the use condition of a user on the intelligent terminal, such as electronic equipment.
For example, the terminal usage data mainly includes the following three major categories: environmental data, user behavior data, and terminal operational data. Wherein each large category of data comprises a plurality of data types, for example, the environmental data comprises data types of temperature, illumination, and the like; the user behavior data may include: the time, place, and frequency of opening the application; the terminal operation data may include: the operation state of the terminal, such as the on-off state of the mobile data network, the connection state of the wireless hotspot, the identity information of the connected wireless hotspot, the currently running application program, the previous foreground application program, the time for the current application program to stay in the background, the time for the current application program to be switched to the background last time, the plugging and unplugging state of the earphone jack, the charging state, the battery power information, the screen display time and other data types.
The terminal usage data may further include data collected by sensors integrated in the terminal, such as a combination of one or more of a motion sensor, a light sensor, a temperature sensor, and a humidity sensor, among others.
The data obtaining module 401 stores the obtained terminal usage data in a preset database, for example, a terminal usage database is constructed in advance by using MySQL.
The data obtaining module 401 stores the obtained terminal usage data in the form of data pairs, for example, as<ci,d(ci)>Wherein c isiRepresents the ith scene of the total scenes, d (c)i) Represents the set of terminal usage data used in the ith scenario, where i ∈ (1, n). The preset scene category needs to be divided in advance, for example, the preset scene category can be divided manually, and the scene of the terminal used by the user is divided into a conference, a meal ordering, a game, a running, a shopping and the like. Suppose thatThere are a total of n scene classes.
Wherein d (c) can be expressed in the form of tensori). For example, d (c) is expressed in the form of a second-order tensor, i.e., a matrixi) One row of the matrix corresponds to one data type, and each row may be a row vector corresponding to one data type. For example, the terminal acquires and records data according to a preset frequency for each type of terminal usage data, and the data acquired in a scene of a specific category may be configured into a data sequence, which may be represented as a long vector. For example, in a running scene, the terminal records GPS data, and assuming that the GPS data is acquired at a frequency of 1 time per second, the user can acquire 600 pieces of GPS data after running for ten minutes, and these pieces of GPS data may form a long vector. Alternatively, d (c) is expressed in the form of a first order tensor, i.e., a vectori) Each element in the vector corresponds to a type of terminal usage data.
Next, in order to determine the degree of association between each data type and a specific type of scene in the terminal usage data of all types in a specific type of scene, terminal usage data of all data types collected in the specific scene is first acquired and stored in the MySQL database.
Optionally, in some embodiments, the data obtaining module 401 may perform a normalization process on the usage number of each type of terminal after obtaining the terminal usage data. Because the dimension between the use data of each type of terminal is different, the influence of the dimension on the data can be eliminated after the data is subjected to the standardization processing, and the accuracy of the regression analysis can be improved when the data regression analysis is subsequently performed.
Specifically, in an optional implementation manner, the apparatus further includes a data processing module, configured to perform a normalization process on the terminal usage data; the step of calculating the association degrees between the plurality of data types and the preset scene categories according to a preset regression algorithm and the terminal usage data comprises: and calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data after the standardization processing.
For example, the data processing module may employ a normalization processing method such as dispersion normalization or Z-score normalization. Dispersion normalization is a linear transformation of the original data, with the resulting values mapped between [0, 1 ]. The Z-score standardization is to standardize the data according to the mean value and standard deviation of the original data, the processed data conforms to the standard normal distribution, namely the mean value is 0, the standard deviation is 1, and the processed data is mapped between [ -1, 1 ]. The data after the standardization processing are dimensionless data, all indexes are in the same order of magnitude, and the accuracy of the regression analysis can be improved by using the data after the standardization processing when the regression analysis is subsequently performed.
A correlation calculation module 402, configured to calculate, according to a preset regression algorithm and the terminal usage data, a correlation between the multiple data types and the preset scene category.
In the embodiment of the present application, the preset regression algorithm may be a linear regression algorithm, a multiple regression algorithm, a ridge regression algorithm, a lasso (least absolute value convergence and selection operator, lasso algorithm) regression algorithm, and the like.
In some embodiments, the predetermined regression algorithm is a linear regression algorithm, and the correlation calculation module 402 is further configured to: performing regression analysis on the preset scene type and the terminal use data of the multiple data types according to linear regression to obtain a weight vector; and determining the association degree between the plurality of data types and the preset scene category according to the weight vector.
Specifically, the association degree calculation module 402 constructs a linear regression equation by using the terminal usage data of the plurality of data types as independent variables and using the scene category as dependent variables; and solving the linear regression equation to obtain a weight vector. For example, a linear regression equation is solved according to a random gradient ascent algorithm to obtain a weight vector.
The linear regression equation can be expressed as y-wx.
C is toiWhen the value of y, i.e., the dependent variable, is expressed as d (c)i) When considering the value of x, an argument, where w is a vector, it can be expressed as w ═ w (w)1,w2,…,wj,…,wm)。
Assuming a total of m data types, wjIs the weight of the jth data type in the m data types. The relevance calculation module 402 solves the regression equation through a stochastic gradient ascent algorithm to obtain a weight vector w, and obtains the relevance between each data type and a preset scene type according to the weight vector w.
The preset scene category can be expressed by numbers in a manner of establishing an index number, for example, the index number of a conference is 1, the index number of a meal ordering is 2, the index number of a game is 3, the index number of a running is 4, and the like.
A parameter adjusting module 403, configured to adjust sampling parameters of the preset scene categories corresponding to the multiple data types according to the association degrees between the multiple data types and the preset scene categories.
And after the association degree between each data type under the preset scene type and the preset scene type is obtained, setting the sampling parameters. The sampling parameters may include sampling frequency and sampling precision, the sampling precision refers to what number of bits is accurate, for example, some data have a low degree of importance and only need to acquire an integer number of bits, while some types of data have a high degree of importance and need to acquire high precision, and then may need to be accurate to 5 bits after a decimal point.
For example, if the degree of association between a data type and a preset scene type is greater, the degree of importance of the data type is determined to be higher, so that a greater sampling frequency and sampling accuracy can be set for the data type. Otherwise, a smaller sampling frequency and sampling precision can be set for the data type. If the importance of the GPS data is high in the travel scene, the sampling frequency and the sampling accuracy of the GPS data need to be improved.
Further, the parameter adjusting module 403 may filter the data type according to the magnitude of the association degree, for example, if the association degree between the data type and the preset scene type is smaller than the preset threshold, delete the data type from the data type to be acquired corresponding to the preset scene type; if the association degree between the data type and the preset scene type is not smaller than the preset threshold, adjusting the sampling frequency and the sampling precision corresponding to the data type according to the association degree between the data type and the preset scene type, wherein the sampling frequency and the sampling precision are in direct proportion to the association degree.
According to the scheme of the embodiment of the application, because the behavior habits of each user are different, even under the same scene, the terminal use data generated by the terminal are different. When the terminal use data of the target user in the target scene is acquired for the first time, all types of terminal use data can be acquired according to the default sampling parameters. And then analyzing all the acquired terminal use data according to the scheme of the embodiment of the application, and determining sampling parameters matched with the target user, wherein the sampling parameters determine the sampling frequency and the sampling precision of each data type of the target user in the target scene. When the behavior of the target in the scene is analyzed again in the following, the terminal use data can be obtained in a targeted manner according to the updated sampling parameters, and the characteristics are extracted for analysis. It is not necessary to pull all the terminal usage data. For example, the target scene is a conference scene, the target user is a user a, and the user data rule and trend of the user a in the conference scene are to be analyzed. After the sampling parameters of the user A in the conference scene are determined through the scheme of the application, the user A can know the accuracy and frequency of the sampling parameters to obtain the terminal use data of the user A in the conference scene, and the relevant data and the irrelevant data do not need to be pulled back. On one hand, the pertinence of the acquired data can be improved, and on the other hand, the scale of the pulled-back data can be reduced to a certain extent without influencing the final analysis conclusion.
After the sampling parameters corresponding to each preset scene category are acquired, the sampling parameters can be stored in a MySQL database according to a data pair mode. For example, store as<ci,Z(ci)>Wherein, Z (c)i) As scene ciCorresponding data typesThe sampling parameter of (1). By the method, after the target scene to be analyzed is determined, the sampling parameters corresponding to the target scene are searched from the database, the required terminal use data are collected according to the searched sampling parameters, the data features are extracted from the obtained data according to the preset feature extraction mode, and the more targeted and differentiated panoramic frequency domain features under each scene can be obtained.
Therefore, the data processing device provided by the embodiment of the application can acquire the terminal use data corresponding to the preset scene type, the terminal use data comprises a plurality of data types, the association degree between the plurality of data types and the preset scene type is calculated according to the preset regression algorithm and the terminal use data, and then the sampling parameter corresponding to the preset scene type is adjusted according to the association degree. By the scheme, targeted sampling parameters can be obtained for different scene types, and further, when data of specific scene types are analyzed, the data can be collected according to the corresponding sampling parameters, so that the collected data scale can be reduced, and targeted data collection can be realized.
The embodiment of the application also provides the electronic equipment. The electronic device can be a smart phone, a tablet computer and the like. As shown in fig. 5, fig. 5 is a schematic view of a first structure of an electronic device according to an embodiment of the present application. The electronic device 300 comprises a processor 301 and a memory 302. The processor 301 is electrically connected to the memory 302.
The processor 301 is a control center of the electronic device 300, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the electronic device.
In this embodiment, the processor 301 in the electronic device 300 loads instructions corresponding to one or more processes of the computer program into the memory 302 according to the following steps, and the processor 301 runs the computer program stored in the memory 302, so as to implement various functions:
acquiring terminal use data corresponding to a preset scene type, wherein the terminal use data comprises a plurality of data types;
calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data;
and adjusting the sampling parameters of the preset scene categories corresponding to the multiple data types according to the association degrees between the multiple data types and the preset scene categories.
In some embodiments, when calculating the association between the plurality of data types and the preset scene category according to a preset regression algorithm and the terminal usage data, the processor 301 performs the following steps:
performing regression analysis on the preset scene type and the terminal use data of the multiple data types according to linear regression to obtain a weight vector;
and determining the association degree between the plurality of data types and the preset scene category according to the weight vector.
In some embodiments, when performing regression analysis on the terminal usage data of the preset scene type and the plurality of data types according to linear regression to obtain a weight vector, the processor 301 performs the following steps:
constructing a linear regression equation by taking the terminal use data of the multiple data types as independent variables and taking the scene type as a dependent variable;
and solving the linear regression equation to obtain a weight vector.
In some embodiments, when solving the linear regression equation to obtain the weight vector, processor 301 performs the following steps:
and solving the linear regression equation to obtain a weight vector by using a random gradient rise algorithm.
In some embodiments, before the step of calculating the association between the plurality of data types and the preset scene category according to a preset regression algorithm and the terminal usage data, the processor 301 performs the following steps:
carrying out standardization processing on the terminal use data;
the step of calculating the association degrees between the plurality of data types and the preset scene categories according to a preset regression algorithm and the terminal usage data comprises: and calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data after the standardization processing.
In some embodiments, when the sampling parameters of the preset scene categories corresponding to the plurality of data types are adjusted according to the association degrees between the plurality of data types and the preset scene categories, the processor 301 performs the following steps:
if the association degree between the data type and the preset scene type is smaller than the preset threshold value, deleting the data type from the data type to be acquired corresponding to the preset scene type;
if the association degree between the data type and the preset scene type is not smaller than the preset threshold, adjusting the sampling frequency and the sampling precision corresponding to the data type according to the association degree between the data type and the preset scene type, wherein the sampling frequency and the sampling precision are in direct proportion to the association degree.
Memory 302 may be used to store computer programs and data. The memory 302 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.
In some embodiments, as shown in fig. 6, fig. 6 is a second schematic structural diagram of an electronic device provided in the embodiments of the present application. The electronic device 300 further includes: radio frequency circuit 303, display screen 304, control circuit 305, input unit 306, audio circuit 307, sensor 308, and power supply 309. The processor 301 is electrically connected to the rf circuit 303, the display 304, the control circuit 305, the input unit 306, the audio circuit 307, the sensor 308, and the power source 309, respectively.
The radio frequency circuit 303 is used for transceiving radio frequency signals to communicate with a network device or other electronic devices through wireless communication.
The display screen 304 may be used to display information entered by or provided to the user as well as various graphical user interfaces of the electronic device, which may be comprised of images, text, icons, video, and any combination thereof.
The control circuit 305 is electrically connected to the display screen 304, and is used for controlling the display screen 304 to display information.
The input unit 306 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit 306 may include a fingerprint recognition module.
Audio circuitry 307 may provide an audio interface between the user and the electronic device through a speaker, microphone. Where audio circuitry 307 includes a microphone. The microphone is electrically connected to the processor 301. The microphone is used for receiving voice information input by a user.
The sensor 308 is used to collect external environmental information. The sensor 308 may include one or more of an ambient light sensor, an acceleration sensor, a gyroscope, and the like.
The power supply 309 is used to power the various components of the electronic device 300. In some embodiments, the power source 309 may be logically coupled to the processor 301 through a power management system, such that functions to manage charging, discharging, and power consumption management are performed through the power management system.
Although not shown in fig. 6, the electronic device 300 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.
Therefore, the electronic device can obtain the terminal use data corresponding to the preset scene type, the terminal use data comprise a plurality of data types, the association degree between the data types and the preset scene type is calculated according to the preset regression algorithm and the terminal use data, the sampling parameter corresponding to the preset scene type is adjusted according to the association degree, and the sampling parameter of each data type in the preset scene is determined according to the association degree by analyzing the association degree between the data types and the scene type. By the scheme, targeted sampling parameters can be obtained for different scene types, and further, when data of specific scene types are analyzed, the data can be collected according to the corresponding sampling parameters, so that the collected data scale can be reduced, and targeted data collection can be realized.
An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the data processing method according to any of the above embodiments.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The term "module" as used herein may be considered a software object executing on the computing system. The different components, modules, engines, and services described herein may be considered as implementation objects on the computing system. The apparatus and method described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.
The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The data processing method, the data processing apparatus, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A data processing method, comprising:
acquiring terminal use data corresponding to a preset scene type, wherein the terminal use data comprises a plurality of data types;
calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data;
and adjusting the sampling parameters of the preset scene categories corresponding to the multiple data types according to the association degrees between the multiple data types and the preset scene categories.
2. The data processing method of claim 1, wherein the step of calculating the association degrees between the plurality of data types and the preset scene categories according to a preset regression algorithm and the terminal usage data comprises:
performing regression analysis on the preset scene type and the terminal use data of the multiple data types according to linear regression to obtain a weight vector;
and determining the association degree between the plurality of data types and the preset scene category according to the weight vector.
3. The data processing method of claim 2, wherein the step of performing regression analysis on the terminal usage data of the preset scene type and the plurality of data types according to linear regression to obtain the weight vector comprises:
constructing a linear regression equation by taking the terminal use data of the multiple data types as independent variables and taking the scene type as a dependent variable;
and solving the linear regression equation to obtain a weight vector.
4. The data processing method of claim 3, wherein the step of solving the linear regression equation to obtain a weight vector comprises:
and solving the linear regression equation to obtain a weight vector by using a random gradient rise algorithm.
5. The data processing method of any one of claims 1 to 4, wherein before the step of calculating the association between the plurality of data types and the preset scene category according to a preset regression algorithm and the terminal usage data, the method further comprises:
carrying out standardization processing on the terminal use data;
the step of calculating the association degrees between the plurality of data types and the preset scene categories according to a preset regression algorithm and the terminal usage data comprises: and calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data after the standardization processing.
6. The data processing method of claim 5, wherein the step of adjusting the sampling parameters of the preset scene categories corresponding to the plurality of data types according to the association between the plurality of data types and the preset scene categories comprises:
if the association degree between the data type and the preset scene type is smaller than the preset threshold value, deleting the data type from the data type to be acquired corresponding to the preset scene type;
if the association degree between the data type and the preset scene type is not smaller than the preset threshold, adjusting the sampling frequency and the sampling precision corresponding to the data type according to the association degree between the data type and the preset scene type, wherein the sampling frequency and the sampling precision are in direct proportion to the association degree.
7. A data processing apparatus, comprising:
the data acquisition module is used for acquiring terminal use data corresponding to a preset scene type, wherein the terminal use data comprises a plurality of data types;
the association degree calculation module is used for calculating the association degrees between the multiple data types and the preset scene categories according to a preset regression algorithm and the terminal use data;
and the parameter adjusting module is used for adjusting the sampling parameters corresponding to the preset scene categories according to the association degrees between the multiple data types and the preset scene categories.
8. The data processing apparatus of claim 7, wherein the relevancy calculation module is further to: performing regression analysis on the preset scene type and the terminal use data of the multiple data types according to linear regression to obtain a weight vector; and determining the association degree between the plurality of data types and the preset scene category according to the weight vector.
9. A storage medium having stored thereon a computer program, characterized in that, when the computer program runs on a computer, it causes the computer to execute a data processing method according to any one of claims 1 to 6.
10. An electronic device comprising a processor and a memory, said memory storing a computer program, characterized in that said processor is adapted to execute the data processing method of any of claims 1 to 6 by invoking said computer program.
CN201910282170.6A 2019-04-09 2019-04-09 Data processing method, data processing device, storage medium and electronic equipment Pending CN111797148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910282170.6A CN111797148A (en) 2019-04-09 2019-04-09 Data processing method, data processing device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910282170.6A CN111797148A (en) 2019-04-09 2019-04-09 Data processing method, data processing device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111797148A true CN111797148A (en) 2020-10-20

Family

ID=72805303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910282170.6A Pending CN111797148A (en) 2019-04-09 2019-04-09 Data processing method, data processing device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111797148A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257015A (en) * 2020-10-28 2021-01-22 华润电力技术研究院有限公司 Thermal power generating unit data acquisition method and system and data processing method
CN113486596A (en) * 2021-07-27 2021-10-08 中国银行股份有限公司 Data preprocessing method, device, equipment and storage medium
CN113940643A (en) * 2021-09-10 2022-01-18 芯海科技(深圳)股份有限公司 Sampling control method, device, electronic equipment and storage medium
CN117592871A (en) * 2024-01-19 2024-02-23 中铁四局集团有限公司 Concrete quality safety tracing and tracking management system based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9412361B1 (en) * 2014-09-30 2016-08-09 Amazon Technologies, Inc. Configuring system operation using image data
CN105939421A (en) * 2016-06-14 2016-09-14 努比亚技术有限公司 Terminal parameter adjusting device and method
CN106486124A (en) * 2015-08-28 2017-03-08 中兴通讯股份有限公司 A kind of method of speech processes and terminal
CN106486127A (en) * 2015-08-25 2017-03-08 中兴通讯股份有限公司 A kind of method of speech recognition parameter adjust automatically, device and mobile terminal
CN107180245A (en) * 2016-03-10 2017-09-19 滴滴(中国)科技有限公司 A kind of indoor and outdoor scene recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9412361B1 (en) * 2014-09-30 2016-08-09 Amazon Technologies, Inc. Configuring system operation using image data
CN106486127A (en) * 2015-08-25 2017-03-08 中兴通讯股份有限公司 A kind of method of speech recognition parameter adjust automatically, device and mobile terminal
CN106486124A (en) * 2015-08-28 2017-03-08 中兴通讯股份有限公司 A kind of method of speech processes and terminal
CN107180245A (en) * 2016-03-10 2017-09-19 滴滴(中国)科技有限公司 A kind of indoor and outdoor scene recognition method and device
CN105939421A (en) * 2016-06-14 2016-09-14 努比亚技术有限公司 Terminal parameter adjusting device and method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257015A (en) * 2020-10-28 2021-01-22 华润电力技术研究院有限公司 Thermal power generating unit data acquisition method and system and data processing method
CN112257015B (en) * 2020-10-28 2023-08-15 华润电力技术研究院有限公司 Thermal power generating unit data acquisition method, system and data processing method
CN113486596A (en) * 2021-07-27 2021-10-08 中国银行股份有限公司 Data preprocessing method, device, equipment and storage medium
CN113940643A (en) * 2021-09-10 2022-01-18 芯海科技(深圳)股份有限公司 Sampling control method, device, electronic equipment and storage medium
CN113940643B (en) * 2021-09-10 2024-06-11 芯海科技(深圳)股份有限公司 Sampling control method, device, electronic equipment and storage medium
CN117592871A (en) * 2024-01-19 2024-02-23 中铁四局集团有限公司 Concrete quality safety tracing and tracking management system based on big data
CN117592871B (en) * 2024-01-19 2024-04-12 中铁四局集团有限公司 Concrete quality safety tracing and tracking management system based on big data

Similar Documents

Publication Publication Date Title
CN111797148A (en) Data processing method, data processing device, storage medium and electronic equipment
CN111797288B (en) Data screening method and device, storage medium and electronic equipment
CN111798811B (en) Screen backlight brightness adjusting method and device, storage medium and electronic equipment
CN113505256B (en) Feature extraction network training method, image processing method and device
CN111797854B (en) Scene model building method and device, storage medium and electronic equipment
CN111797861A (en) Information processing method, information processing apparatus, storage medium, and electronic device
CN111797851A (en) Feature extraction method and device, storage medium and electronic equipment
CN111797302A (en) Model processing method and device, storage medium and electronic equipment
CN113032587B (en) Multimedia information recommendation method, system, device, terminal and server
CN111800445B (en) Message pushing method and device, storage medium and electronic equipment
CN111797849A (en) User activity identification method and device, storage medium and electronic equipment
CN114298123A (en) Clustering method and device, electronic equipment and readable storage medium
CN111796926A (en) Instruction execution method and device, storage medium and electronic equipment
CN111797867A (en) System resource optimization method and device, storage medium and electronic equipment
CN111798019B (en) Intention prediction method, intention prediction device, storage medium and electronic equipment
CN111797874B (en) Behavior prediction method and device, storage medium and electronic equipment
CN111797873A (en) Scene recognition method and device, storage medium and electronic equipment
CN112560612B (en) System, method, computer device and storage medium for determining business algorithm
CN111797860B (en) Feature extraction method and device, storage medium and electronic equipment
CN111796663B (en) Scene recognition model updating method and device, storage medium and electronic equipment
CN108829600B (en) Method and device for testing algorithm library, storage medium and electronic equipment
CN111797880A (en) Data processing method, data processing device, storage medium and electronic equipment
CN111796916A (en) Data distribution method, device, storage medium and server
CN111800537B (en) Terminal use state evaluation method and device, storage medium and electronic equipment
CN111797875B (en) Scene modeling method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination