CN111814475A - User portrait construction method and device, storage medium and electronic equipment - Google Patents

User portrait construction method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111814475A
CN111814475A CN201910282116.1A CN201910282116A CN111814475A CN 111814475 A CN111814475 A CN 111814475A CN 201910282116 A CN201910282116 A CN 201910282116A CN 111814475 A CN111814475 A CN 111814475A
Authority
CN
China
Prior art keywords
semantic
data
tag
semantic tag
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910282116.1A
Other languages
Chinese (zh)
Inventor
何明
陈仲铭
黄粟
刘耀勇
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201910282116.1A priority Critical patent/CN111814475A/en
Publication of CN111814475A publication Critical patent/CN111814475A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The embodiment of the application discloses a user portrait construction method, a user portrait construction device, a storage medium and electronic equipment; the method comprises the following steps: acquiring equipment information, and preprocessing the equipment information to obtain initial text data; carrying out entity recognition on the initial text data, and generating a first semantic label based on a recognition result; processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag; a user representation is constructed based at least on the first semantic tag and the second semantic tag. In the scheme, the user portraits are constructed through the semantic tags of different levels, the user portraits semantic tags of different levels can be provided according to the requirements of intelligent services, on one hand, the quality of the intelligent services can be guaranteed, and on the other hand, the user privacy can be protected to a certain extent.

Description

User portrait construction method and device, storage medium and electronic equipment
Technical Field
The application relates to the field of electronic equipment, in particular to a user portrait construction method and device, a storage medium and electronic equipment.
Background
With the development of electronic technology, electronic devices such as smart phones have become more and more intelligent. The electronic device may perform data processing through various algorithmic models to provide various functions to the user. For example, the electronic device may learn behavior characteristics of the user according to the algorithm model, thereby providing personalized services to the user.
Disclosure of Invention
The embodiment of the application provides a user portrait construction method and device, a storage medium and electronic equipment, and the quality of intelligent service can be improved.
In a first aspect, an embodiment of the present application provides a user portrait construction method, including:
acquiring equipment information, wherein the equipment information at least comprises: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
preprocessing the equipment information to obtain initial text data;
performing entity recognition on the initial text data, and generating a first semantic label based on a recognition result;
processing the initial text data and the first semantic label to obtain a second semantic label, wherein the semantic level of the second semantic label is higher than that of the first semantic label;
and constructing the user portrait at least according to the first semantic label and the second semantic label.
In a second aspect, an embodiment of the present application further provides a user representation creating apparatus, including:
an obtaining module, configured to obtain device information, where the device information at least includes: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
the first processing module is used for preprocessing the equipment information to obtain initial text data;
the identification module is used for carrying out entity identification on the initial text data and generating a first semantic label based on an identification result;
the second processing module is used for processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag;
and the construction module is used for constructing the user portrait at least according to the first semantic label and the second semantic label.
In a third aspect, an embodiment of the present application further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the user portrait construction method.
In a fourth aspect, an embodiment of the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the user representation construction method when executing the program.
According to the user portrait construction method, device information is obtained, and is preprocessed to obtain initial text data; performing entity recognition on the initial text data, and generating a first semantic label based on a recognition result; processing the initial text data and the first semantic label to obtain a second semantic label, wherein the semantic level of the second semantic label is higher than that of the first semantic label; and constructing the user portrait at least according to the first semantic label and the second semantic label. In the scheme, the user portraits are constructed through the semantic tags of different levels, the user portraits semantic tags of different levels can be provided according to the requirements of intelligent services, on one hand, the quality of the intelligent services can be guaranteed, and on the other hand, the user privacy can be protected to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a panoramic sensing architecture provided in an embodiment of the present application.
Fig. 2 is a first flowchart of a user portrait construction method according to an embodiment of the present disclosure.
Fig. 3 is a second flowchart of a user portrait construction method according to an embodiment of the present disclosure.
Fig. 4 is a third flowchart illustrating a user portrait building method according to an embodiment of the present application.
Fig. 5 is a schematic view of a scene architecture of a user portrait construction method according to an embodiment of the present application.
FIG. 6 is a schematic structural diagram of a user representation constructing apparatus according to an embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.
Referring to fig. 1, fig. 1 is a schematic view of a panoramic sensing architecture provided in an embodiment of the present application. The user portrait construction method is applied to electronic equipment. A panoramic perception framework is arranged in the electronic equipment. The panoramic perception architecture is an integration of hardware and software used to implement the user portrait construction method in an electronic device.
The panoramic perception architecture comprises an information perception layer, a data processing layer, a feature extraction layer, a scene modeling layer and an intelligent service layer.
The information perception layer is used for acquiring information of the electronic equipment or information in an external environment. The information-perceiving layer may include a plurality of sensors. For example, the information sensing layer includes a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, a barometer, and a heart rate sensor.
Among other things, a distance sensor may be used to detect a distance between the electronic device and an external object. The magnetic field sensor may be used to detect magnetic field information of the environment in which the electronic device is located. The light sensor can be used for detecting light information of the environment where the electronic equipment is located. The acceleration sensor may be used to detect acceleration data of the electronic device. The fingerprint sensor may be used to collect fingerprint information of a user. The Hall sensor is a magnetic field sensor manufactured according to the Hall effect, and can be used for realizing automatic control of electronic equipment. The location sensor may be used to detect the geographic location where the electronic device is currently located. Gyroscopes may be used to detect angular velocity of an electronic device in various directions. Inertial sensors may be used to detect motion data of an electronic device. The gesture sensor may be used to sense gesture information of the electronic device. A barometer may be used to detect the barometric pressure of the environment in which the electronic device is located. The heart rate sensor may be used to detect heart rate information of the user.
And the data processing layer is used for processing the data acquired by the information perception layer. For example, the data processing layer may perform data cleaning, data integration, data transformation, data reduction, and the like on the data acquired by the information sensing layer.
The data cleaning refers to cleaning a large amount of data acquired by the information sensing layer to remove invalid data and repeated data. The data integration refers to integrating a plurality of single-dimensional data acquired by the information perception layer into a higher or more abstract dimension so as to comprehensively process the data of the plurality of single dimensions. The data transformation refers to performing data type conversion or format conversion on the data acquired by the information sensing layer so that the transformed data can meet the processing requirement. The data reduction means that the data volume is reduced to the maximum extent on the premise of keeping the original appearance of the data as much as possible.
The characteristic extraction layer is used for extracting characteristics of the data processed by the data processing layer so as to extract the characteristics included in the data. The extracted features may reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located, etc.
The feature extraction layer may extract features or process the extracted features by a method such as a filtering method, a packing method, or an integration method.
The filtering method is to filter the extracted features to remove redundant feature data. Packaging methods are used to screen the extracted features. The integration method is to integrate a plurality of feature extraction methods together to construct a more efficient and more accurate feature extraction method for extracting features.
The scene modeling layer is used for building a model according to the features extracted by the feature extraction layer, and the obtained model can be used for representing the state of the electronic equipment, the state of a user, the environment state and the like. For example, the scenario modeling layer may construct a key value model, a pattern identification model, a graph model, an entity relation model, an object-oriented model, and the like according to the features extracted by the feature extraction layer.
The intelligent service layer is used for providing intelligent services for the user according to the model constructed by the scene modeling layer. For example, the intelligent service layer can provide basic application services for users, perform system intelligent optimization for electronic equipment, and provide personalized intelligent services for users.
In addition, the panoramic perception architecture can further comprise a plurality of algorithms, each algorithm can be used for analyzing and processing data, and the plurality of algorithms can form an algorithm library. For example, the algorithm library may include algorithms such as a markov algorithm, a hidden dirichlet distribution algorithm, a bayesian classification algorithm, a support vector machine, a K-means clustering algorithm, a K-nearest neighbor algorithm, a conditional random field, a residual error network, a long-short term memory network, a convolutional neural network, and a cyclic neural network.
The embodiment of the application provides a user portrait construction method, which can be applied to electronic equipment. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a notebook, a desktop computing device, a wearable device such as a watch, glasses, a helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Referring to fig. 2, fig. 2 is a first flowchart illustrating a user portrait building method according to an embodiment of the present application. The user portrait construction method comprises the following steps:
and 110, acquiring equipment information, wherein the equipment information at least comprises: the device comprises environment information of the device in the historical time period, application use information and text content information in the device.
Specifically, all information that can be used for depicting the user portrait outside and inside the electronic device used by the user can be obtained, so that a personal database of the user can be constructed subsequently. The method mainly comprises the following steps: user environment migration information, such as migration of geographical locations, changes in temperature, changes in ambient sound, etc.; text content information, such as text information such as short messages and schedules stored locally in the electronic equipment, can well reflect dimension information such as travel, occupation and the like of a user; the application use information, such as introduction information of the application, the use behavior of the user and the like, can reflect the occupation, the hobbies and the personal habits of the user.
Of course, other behavior information of the user can be included, such as a web browsing record, a telephone communication record, and the like, and information describing the user's habits, profession, and the like can also be used.
And 120, preprocessing the equipment information to obtain initial text data.
Specifically, after a large amount of device information is acquired, the acquired information needs to be preprocessed to screen out useful information from the information for constructing a user personal database, so that the data volume in the database is reduced, and the portability of the database is improved. Therefore, in some embodiments, the step of "preprocessing the device information to obtain the initial text data" may include the following steps:
processing the environmental information to obtain geographic position data, temperature data, noisy degree data and illumination data;
processing the application use information to obtain application use preference data and application use time data;
identifying text content information, and screening travel data from the text content information;
and generating initial text data according to the geographic position data, the temperature data, the noisy degree data, the illumination data, the application use preference data and the journey data.
Wherein the geographic location data, temperature data, noisy data, lighting data, etc. may reflect the user's living environment, working environment, and some environmental preferences, etc. If the geographic location is frequently changed and the noise level is relatively high, it can be reflected that the user may be doing an occupation that needs to be frequently run around and needs to communicate with people.
The application use preference data and the application use time data can well reflect the information of the interests, hobbies, work and rest situations and the like of the user. In addition, the physical health condition of the user can be reflected laterally through the work and rest condition.
The journey data can reflect the information of the user in the aspects of occupation, living habits and the like. If the journey on the working day is rich, the user needs to frequently walk to go on business and the like; if the journey in the holiday is rich, the fact that the user is used to travel or activities out indicates that the economic basis of the user is probably better, the personality is probably outward, and the like.
And then, integrating the preprocessed data, and screening out repeated or invalid data, thereby obtaining initial text data with higher reliability and stronger descriptive property.
In some embodiments, a user personal database for collecting and collating comprehensive personal information of a user may be constructed and the resulting initial text data stored in the database for subsequent data retrieval and storage. For example, the database may be a Structured Query Language (SQL) based database.
And 130, performing entity recognition on the initial text data, and generating a first semantic label based on the recognition result.
In the embodiment of the application, it is required to implement that different clustering processing is performed on the obtained initial text data by adopting algorithms or technologies with different clustering capabilities, so that semantic labels with different levels are obtained for depicting the user image.
In some embodiments, entities may be extracted directly from the initial text data as semantic tags at a first level. That is, the step "performing entity recognition on the initial text data and generating the first semantic tag based on the recognition result" may include the following processes:
extracting a plurality of second keywords from the initial text data;
performing entity identification on the second keywords based on an entity identification technology to obtain an identification result, wherein the identification result comprises the second keywords identified as the entity;
a first semantic tag is generated based on the second keyword identified as the entity.
An entity is understood to mean an entity that is a support of its nature, details, objects, or phenomena, and is defined as a material that can stand alone, that is, a basis for all its attributes, and that can serve as a support of all its attributes. Such as age, profession, listening to music, playing games, favorite spoken words, etc.
Specifically, the second keyword extracted from the initial text database may be a meaningful word, and some conjunctions (e.g., "of", "wool") may be removed. Then, based on the entity recognition technology, the meaning of the screened second keyword is recognized, so that the second keyword serving as the entity is recognized, and a semantic label is generated based on the second keyword.
In some embodiments, the second keyword identified as an entity may be directly used as the first semantic tag.
And 140, processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag.
Specifically, the output result in the step 130 may be processed by using a correlation algorithm model in combination with the preprocessed initial text data to obtain the second semantic tag. It should be noted that the semantic hierarchy of the second semantic tag is higher than that of the first semantic tag, which means that the first semantic tag can be clustered with the second semantic tag. For example, if the first semantic label is ping-pong, football, basketball, billiards, then the second semantic label can be big ball, small ball.
150. A user representation is constructed based at least on the first semantic tag and the second semantic tag.
Specifically, semantic labels of different levels are formed based on the obtained first semantic label and the second semantic label, and can be stored in different areas of the database corresponding to the user image, so that when a task is executed, the semantic labels of different levels are called according to actual requirements to meet task requirements.
In some embodiments, referring to fig. 3, fig. 3 is a second flowchart illustrating a user representation constructing method according to an embodiment of the present disclosure.
In the embodiment of the present application, there may be various ways to generate the second semantic tag. In some embodiments, the input data (i.e., the initial text data and the first semantic tag) may be processed based on a topic model. The method comprises the following steps of processing the equipment information and the first semantic label to obtain a second semantic label, wherein the method comprises the following steps:
141. clustering the initial text data and the first semantic tags by adopting a preset topic model to obtain a plurality of text sets;
142. and respectively matching corresponding text labels for the plurality of text sets to serve as second semantic labels.
In some embodiments, the preset topic model may be an implicit dirichlet Allocation (LDA) model, which is a subject generation model. LDA is an unsupervised machine learning technique that can be used to identify underlying topic information in large-scale document sets or corpora. The method adopts a bag-of-words method, and each document is regarded as a word frequency vector, so that text information is converted into digital information which is easy to model. The bag-of-words approach does not take into account word-to-word ordering, which simplifies the complexity of the problem and also provides opportunities for model improvement. Each document represents a probability distribution of topics, and each topic represents a probability distribution of words.
In practical application, the LDA theme model can gather words with similar semantics, the LDA theme model can gather table tennis and billiards, football and basketball, and teacups, teapots and tea trays. But it only gathers these together and does not give a label, that is, the tea cup, the tea pot and the saucer gather together, but what they represent needs to be given a label artificially. For example, the tea set may be set as the second semantic label for a cup, a teapot, a saucer.
In addition, corresponding explicit labels may also be given or based on statistical rules of word (i.e., chinese character) distribution under semantic labels. This statistical rule mainly refers to the probability of the distribution of individual words under this category. The LDA model can be used for learning a pile of text corpora, words with similar semantics are put together to form a plurality of classes, a probability of each word is given in each class, and then a proper semantic label can be given based on the probability distribution of the words in each class.
In some embodiments, a Unigram model, mix of Unigram, or the like model may also be employed to extract the second semantic tag.
In some embodiments, the step "matching corresponding text labels for the plurality of text sets respectively as second semantic labels" includes:
constructing a first text vector for the text information in each text set;
respectively constructing a second text vector for a plurality of text labels in a preset label set;
calculating a vector distance between the first text vector and the second text vector;
and selecting a corresponding text label from a preset label set according to the vector distance to serve as a second semantic label.
Specifically, when the text labels are matched with the text set, the distance between the word vector corresponding to the text information in the text set and the word vector corresponding to the preset text label can be obtained based on a word vector mode, so that the semantic similarity between the word vector and the preset text label is determined. The smaller the vector distance, the higher the semantic similarity. Therefore, the text label corresponding to the minimum distance can be selected from the preset label set according to the vector distance to serve as the second semantic label of the text set where the calculated text information is located.
In some embodiments, referring to fig. 4 and fig. 5, fig. 4 is a third flowchart illustrating a user representation constructing method according to an embodiment of the present application. Fig. 5 is a scene architecture diagram of a user portrait construction method according to an embodiment of the present application.
The required level of tagging is different due to different intelligent services. For example, some applications require a low level of user portrait semantic tags, while some applications do not require a low level of user portrait semantic tags. Therefore, in order to meet the demand of more intelligent services, more levels of semantic tags can be constructed. That is, after the entity recognition is performed on the initial text data, and the first semantic tag is generated based on the recognition result, before the user portrait is constructed according to the first semantic tag and the second semantic tag, the following process may be further included:
and 160, processing the initial text data, the first semantic tag and the second semantic tag to obtain a third semantic tag, wherein the semantic level of the third semantic tag is higher than that of the second semantic tag.
Specifically, the output results in the step 130 and the step 140 may be processed by using a correlation algorithm model in combination with the preprocessed initial text data to obtain the second semantic tag. It should be noted that the semantic hierarchy of the third semantic tag is higher than that of the second semantic tag, which means that the second semantic tag can be clustered with the third semantic tag. For example, if the first semantic tag is ping-pong, football, basketball, billiards, the second semantic tag may be big ball or small ball, and the third semantic tag may be ball.
The step of "constructing a user representation based on at least the first semantic tag and the second semantic tag" comprises the following steps:
151. and constructing the user portrait according to the first semantic label, the second semantic label and the third semantic label.
Specifically, semantic labels of different levels are formed based on the obtained first semantic label, second semantic label and third semantic label, and can be stored in different areas of the database corresponding to the user image, so that when a task is executed, the semantic labels of different levels are called according to actual conditions to meet the requirements of different intelligent services.
In practical applications, there are various ways to generate the third semantic tag. In some embodiments, a neural network model with more intelligent clustering capability may be used to perform semantic analysis on the data to obtain a third semantic tag. That is, the step of "processing the device information, the first semantic tag and the second semantic tag to obtain the third semantic tag" may include the following steps:
extracting a first keyword from the initial text data, the first semantic tag and the second semantic tag to obtain a keyword set;
constructing a corresponding word vector for each first keyword in the keyword set;
and clustering the word vectors based on a preset neural network model to obtain a third semantic label.
Specifically, the first keyword extracted from the first semantic tag and the second semantic tag of the initial text database may be a meaningful word, and some conjunctions (e.g., "the" or "the" may be removed). And then, constructing a corresponding word vector for each first keyword, and inputting the word vector as input data into a preset neural network model for clustering, thereby outputting classified data and obtaining high-dimensional user portrait semantic information, namely a third semantic label. The preset Neural Network model may be one or more of a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and a Deep Neural Network (DNN).
Because the single-level user portrait label system is difficult to accurately depict the basic appearance, interest and behavior preference of the user, and simultaneously, the precision caused by insufficient data volume is difficult to meet the actual application requirement. According to the method and the system, firstly, more abundant user personal information such as environment migration information and short message schedule information can be comprehensively acquired based on the intelligent electronic equipment, and further comprehensiveness and accuracy of a user portrait label system are guaranteed. Meanwhile, a multi-level user portrait label system is constructed through an entity recognition technology, a theme model and a neural network, and basic face, interest and behavior preference of a user can be comprehensively described. Because the label levels required by different intelligent services are different, if the application needs a low-level user portrait semantic label, and if the application needs no low-level user portrait semantic label, only a more abstract user portrait semantic label is needed (namely, a second or third-level user portrait semantic label), and the intelligent service can be distributed as required.
In view of the above, in the user portrait construction method provided in the embodiment of the present application, the device information is obtained and preprocessed to obtain initial text data; carrying out entity recognition on the initial text data, and generating a first semantic label based on a recognition result; processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag; a user representation is constructed based at least on the first semantic tag and the second semantic tag. In the scheme, the user portraits are constructed through the semantic tags of different levels, the user portraits semantic tags of different levels can be provided according to the requirements of intelligent services, on one hand, the quality of the intelligent services can be guaranteed, and on the other hand, the user privacy can be protected to a certain extent.
In some embodiments, based on the user portrait construction method provided by the embodiments of the present application: the method comprises the steps of firstly obtaining device information (such as environment information of the device in a historical time period, application use information, text content information in the device and the like) of electronic equipment of a user through an information perception layer, and then processing the device information (such as invalid data deletion and the like) through a data processing layer to obtain initial text data. And then, extracting an entity from the initial text data obtained after the processing of the data processing layer through the feature extraction layer, and generating a first semantic label based on the extracted entity. And then, processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag. And inputting the obtained first semantic label and the second semantic label into a scene modeling layer, wherein the scene modeling layer comprises a pre-stored prediction model, and the prediction model of the scene modeling layer is trained according to the input information to obtain the trained prediction model. And finally, when the intelligent service layer is portrayed by the user, the trained prediction model can be used for providing user portrayal semantic labels of different levels according to the requirements of the intelligent service, so that the quality of the intelligent service can be ensured, and the privacy of the user can be protected to a certain extent.
The embodiment of the application also provides a user portrait construction device. The user representation construction means may be integrated in the electronic device. The electronic device may be a smartphone, a tablet, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a notebook, a desktop computing device, a wearable device such as a watch, glasses, a helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a user representation constructing apparatus according to an embodiment of the present disclosure. User representation construction apparatus 200 may include: an obtaining module 201, a first processing module 202, an identifying module 203, a second processing module 204, and a constructing module 205, wherein:
an obtaining module 201, configured to obtain device information, where the device information at least includes: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
the first processing module 202 is configured to pre-process the device information to obtain initial text data;
the identification module 203 is used for performing entity identification on the initial text data and generating a first semantic tag based on an identification result;
the second processing module 204 is configured to process the initial text data and the first semantic tag to obtain a second semantic tag, where a semantic level of the second semantic tag is higher than a semantic level of the first semantic tag;
a construction module 205 for constructing a user representation based at least on the first semantic tag and the second semantic tag.
In some embodiments, user representation construction apparatus 200 may further include:
the third processing module is used for processing the initial text data, the first semantic tag and the second semantic tag to obtain a third semantic tag after the initial text data is subjected to entity recognition and the first semantic tag is generated based on the recognition result and before the user portrait is constructed according to the first semantic tag and the second semantic tag, wherein the semantic hierarchy of the third semantic tag is higher than that of the second semantic tag;
and the building module is used for building the user portrait according to the first semantic label, the second semantic label and the third semantic label.
In some embodiments, the third processing module may be to:
extracting a first keyword from the initial text data, the first semantic tag and the second semantic tag to obtain a keyword set;
constructing a corresponding word vector for each first keyword in the keyword set;
clustering the word vectors based on a preset neural network model to obtain a third semantic label
In some embodiments, the second processing module may include:
the processing submodule is used for clustering the initial text data and the first semantic tags by adopting a preset theme model to obtain a plurality of text sets;
and the matching sub-module is used for respectively matching the corresponding text labels for the plurality of text sets to serve as second semantic labels.
In some embodiments, the matching sub-module may be to:
constructing a first text vector for the text information in each text set;
respectively constructing a second text vector for a plurality of text labels in a preset label set;
calculating a vector distance between the first text vector and the second text vector;
and selecting a corresponding text label from a preset label set according to the vector distance to serve as a second semantic label.
As can be seen from the above, the user portrait construction apparatus 200 provided in this embodiment of the present application obtains the device information, and performs preprocessing on the device information to obtain the initial text data; carrying out entity recognition on the initial text data, and generating a first semantic label based on a recognition result; processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag; a user representation is constructed based at least on the first semantic tag and the second semantic tag. According to the scheme, the user portraits are constructed through the semantic labels of different levels, the user portraits semantic labels of different levels can be provided according to the requirements of intelligent services, on one hand, the quality of the intelligent services can be guaranteed, and on the other hand, the user privacy can be protected to a certain extent.
The embodiment of the application also provides the electronic equipment. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a notebook, a desktop computing device, a wearable device such as a watch, glasses, a helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Referring to fig. 7, fig. 7 is a schematic view of a first structure of an electronic device 300 according to an embodiment of the present disclosure. Electronic device 300 includes, among other things, a processor 301 and a memory 302. The processor 301 is electrically connected to the memory 302.
The processor 301 is a control center of the electronic device 300, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the electronic device.
In this embodiment, the processor 301 in the electronic device 300 loads instructions corresponding to one or more processes of the computer program into the memory 302 according to the following steps, and the processor 301 runs the computer program stored in the memory 302, so as to implement various functions:
acquiring equipment information, wherein the equipment information at least comprises: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
preprocessing equipment information to obtain initial text data;
carrying out entity recognition on the initial text data, and generating a first semantic label based on a recognition result;
processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag;
and constructing the user portrait at least according to the first semantic label and the second semantic label.
In some embodiments, after performing entity recognition on the initial text data and generating a first semantic tag based on the recognition result, before constructing a user representation from the first semantic tag and a second semantic tag, processor 301 performs the following steps:
processing the initial text data, the first semantic tag and the second semantic tag to obtain a third semantic tag, wherein the semantic level of the third semantic tag is higher than that of the second semantic tag;
the constructing a user representation based at least on the first semantic tag and the second semantic tag comprises:
and constructing the user portrait according to the first semantic label, the second semantic label and the third semantic label.
In some embodiments, when the device information, the first semantic tag, and the second semantic tag are processed to obtain a third semantic tag, the processor 301 further performs the following steps:
extracting a first keyword from the initial text data, the first semantic tag and the second semantic tag to obtain a keyword set;
constructing a corresponding word vector for each first keyword in the keyword set;
and clustering the word vectors based on a preset neural network model to obtain a third semantic label.
In some embodiments, when the device information and the first semantic tag are processed to obtain a second semantic tag, processor 301 performs the following steps:
clustering the initial text data and the first semantic tags by adopting a preset topic model to obtain a plurality of text sets;
and respectively matching corresponding text labels for the plurality of text sets to serve as second semantic labels.
In some embodiments, when matching corresponding text labels for the plurality of text sets respectively as the second semantic label, the processor 301 performs the following steps:
constructing a first text vector for the text information in each text set;
respectively constructing a second text vector for a plurality of text labels in a preset label set;
calculating a vector distance between the first text vector and a second text vector;
and selecting a corresponding text label from the preset label set according to the vector distance to serve as the second semantic label.
In some embodiments, when performing entity recognition on the initial text data and generating a first semantic tag based on the recognition result, processor 301 performs the following steps:
extracting a plurality of second keywords from the initial text data;
performing entity identification on the second keywords based on an entity identification technology to obtain an identification result, wherein the identification result comprises the second keywords identified as the entity;
a first semantic tag is generated based on the second keyword identified as the entity.
In some embodiments, when the device information is preprocessed to obtain the initial text data, the processor 301 performs the following steps:
processing the environment information to obtain geographic position data, temperature data, noisy degree data and illumination data;
processing the application use information to obtain application use preference data and application use time data;
identifying the text content information, and screening out travel data from the text content information;
and generating initial text data according to the geographic position data, the temperature data, the noisy degree data, the illumination data, the application use preference data and the journey data.
Memory 302 may be used to store computer programs and data. The memory 302 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.
In some embodiments, referring to fig. 8, fig. 8 is a schematic diagram of a second structure of an electronic device 300 according to an embodiment of the present disclosure.
Wherein, the electronic device 300 further comprises: a display 303, a control circuit 304, an input unit 305, a sensor 306, and a power supply 307. The processor 301 is electrically connected to the display 303, the control circuit 304, the input unit 305, the sensor 306, and the power source 307.
The display screen 303 may be used to display information entered by or provided to the user as well as various graphical user interfaces of the electronic device, which may be comprised of images, text, icons, video, and any combination thereof.
The control circuit 304 is electrically connected to the display 303, and is configured to control the display 303 to display information.
The input unit 305 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. Wherein, the input unit 305 may include a fingerprint recognition module.
The sensor 306 is used to collect information of the electronic device itself or information of the user or external environment information. For example, the sensor 306 may include a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, a barometer, a heart rate sensor, and the like.
The power supply 307 is used to power the various components of the electronic device 300. In some embodiments, the power supply 307 may be logically coupled to the processor 301 through a power management system, such that functions of managing charging, discharging, and power consumption are performed through the power management system.
Although not shown in fig. 8, the electronic device 300 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.
As can be seen from the above, an embodiment of the present application provides an electronic device, where the electronic device performs the following steps: acquiring equipment information, and preprocessing the equipment information to obtain initial text data; performing entity recognition on the initial text data, and generating a first semantic label based on a recognition result; processing the initial text data and the first semantic label to obtain a second semantic label, wherein the semantic level of the second semantic label is higher than that of the first semantic label; and constructing the user portrait at least according to the first semantic label and the second semantic label. In the scheme, the user portraits are constructed through the semantic tags of different levels, the user portraits semantic tags of different levels can be provided according to the requirements of intelligent services, on one hand, the quality of the intelligent services can be guaranteed, and on the other hand, the user privacy can be protected to a certain extent.
The embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the user representation construction method according to any of the above embodiments.
For example, in some embodiments, when the computer program is run on a computer, the computer performs the steps of:
acquiring equipment information, wherein the equipment information at least comprises: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
preprocessing equipment information to obtain initial text data;
carrying out entity recognition on the initial text data, and generating a first semantic label based on a recognition result;
processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag;
a user representation is constructed based at least on the first semantic tag and the second semantic tag.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The user portrait construction method, device, storage medium and electronic device provided by the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A user portrait construction method, characterized in that the user portrait construction method comprises:
acquiring equipment information, wherein the equipment information at least comprises: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
preprocessing the equipment information to obtain initial text data;
performing entity recognition on the initial text data, and generating a first semantic label based on a recognition result;
processing the initial text data and the first semantic label to obtain a second semantic label, wherein the semantic level of the second semantic label is higher than that of the first semantic label;
and constructing the user portrait at least according to the first semantic label and the second semantic label.
2. The method of constructing a user representation according to claim 1, further comprising, after the entity identifying the initial text data and generating a first semantic tag based on the identification result, before constructing a user representation from the first semantic tag and a second semantic tag:
processing the initial text data, the first semantic tag and the second semantic tag to obtain a third semantic tag, wherein the semantic level of the third semantic tag is higher than that of the second semantic tag;
the constructing a user representation based at least on the first semantic tag and the second semantic tag comprises:
and constructing the user portrait according to the first semantic label, the second semantic label and the third semantic label.
3. The user representation construction method of claim 2, wherein the processing the device information, the first semantic tag, and the second semantic tag to obtain a third semantic tag comprises:
extracting a first keyword from the initial text data, the first semantic tag and the second semantic tag to obtain a keyword set;
constructing a corresponding word vector for each first keyword in the keyword set;
and clustering the word vectors based on a preset neural network model to obtain a third semantic label.
4. The user representation construction method of claim 1, wherein the processing the device information and the first semantic tag to obtain a second semantic tag comprises:
clustering the initial text data and the first semantic tags by adopting a preset topic model to obtain a plurality of text sets;
and respectively matching corresponding text labels for the plurality of text sets to serve as second semantic labels.
5. The method for constructing a user representation according to claim 4, wherein said matching corresponding text labels for said plurality of text sets respectively as a second semantic label comprises:
constructing a first text vector for the text information in each text set;
respectively constructing a second text vector for a plurality of text labels in a preset label set;
calculating a vector distance between the first text vector and a second text vector;
and selecting a corresponding text label from the preset label set according to the vector distance to serve as the second semantic label.
6. The user representation construction method of claim 1, wherein the performing entity recognition on the initial text data and generating a first semantic tag based on the recognition result comprises:
extracting a plurality of second keywords from the initial text data;
performing entity identification on the second keywords based on an entity identification technology to obtain an identification result, wherein the identification result comprises the second keywords identified as the entity;
a first semantic tag is generated based on the second keyword identified as the entity.
7. A user representation construction method according to claim 1, wherein said preprocessing said device information to obtain initial text data comprises:
processing the environment information to obtain geographic position data, temperature data, noisy degree data and illumination data;
processing the application use information to obtain application use preference data and application use time data;
identifying the text content information, and screening out travel data from the text content information;
and generating initial text data according to the geographic position data, the temperature data, the noisy degree data, the illumination data, the application use preference data and the journey data.
8. A user representation construction apparatus, said user representation construction apparatus comprising:
an obtaining module, configured to obtain device information, where the device information at least includes: the method comprises the steps that environmental information, application use information and text content information in the equipment in a historical time period are obtained;
the first processing module is used for preprocessing the equipment information to obtain initial text data;
the identification module is used for carrying out entity identification on the initial text data and generating a first semantic label based on an identification result;
the second processing module is used for processing the initial text data and the first semantic tag to obtain a second semantic tag, wherein the semantic level of the second semantic tag is higher than that of the first semantic tag;
and the construction module is used for constructing the user portrait at least according to the first semantic label and the second semantic label.
9. The user representation construction device of claim 8, further comprising:
the third processing module is used for processing the initial text data, the first semantic tag and the second semantic tag to obtain a third semantic tag after the initial text data is subjected to entity recognition and the first semantic tag is generated based on a recognition result and before a user portrait is constructed according to the first semantic tag and the second semantic tag, wherein the semantic level of the third semantic tag is higher than that of the second semantic tag;
and the construction module is used for constructing the user portrait according to the first semantic label, the second semantic label and the third semantic label.
10. The user representation construction apparatus of claim 9, wherein said third processing module is configured to:
extracting a first keyword from the initial text data, the first semantic tag and the second semantic tag to obtain a keyword set;
constructing a corresponding word vector for each first keyword in the keyword set;
and clustering the word vectors based on a preset neural network model to obtain a third semantic label.
11. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method according to any one of claims 1-7.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented when the processor executes the program.
CN201910282116.1A 2019-04-09 2019-04-09 User portrait construction method and device, storage medium and electronic equipment Withdrawn CN111814475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910282116.1A CN111814475A (en) 2019-04-09 2019-04-09 User portrait construction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910282116.1A CN111814475A (en) 2019-04-09 2019-04-09 User portrait construction method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111814475A true CN111814475A (en) 2020-10-23

Family

ID=72843594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910282116.1A Withdrawn CN111814475A (en) 2019-04-09 2019-04-09 User portrait construction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111814475A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862289A (en) * 2021-01-29 2021-05-28 上海妙一生物科技有限公司 Information matching method and device for clinical research practitioner
CN113609379A (en) * 2021-07-12 2021-11-05 北京达佳互联信息技术有限公司 Label system construction method and device, electronic equipment and storage medium
CN113688607A (en) * 2021-08-02 2021-11-23 北京明略软件***有限公司 Portrait updating method and apparatus for on-line document author
CN113836905A (en) * 2021-09-24 2021-12-24 网易(杭州)网络有限公司 Theme extraction method and device, terminal and storage medium
CN114153716A (en) * 2022-02-08 2022-03-08 中国电子科技集团公司第五十四研究所 Real-time portrait generation method for people and nobody objects under semantic information exchange network
CN116779109A (en) * 2023-05-24 2023-09-19 纬英(广州)教育科技有限公司 Self-feature discovery method and device based on exploration scene guidance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126582A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 Recommend method and device
CN108062375A (en) * 2017-12-12 2018-05-22 百度在线网络技术(北京)有限公司 A kind of processing method, device, terminal and the storage medium of user's portrait
CN108288229A (en) * 2018-03-02 2018-07-17 北京邮电大学 A kind of user's portrait construction method
WO2019041524A1 (en) * 2017-08-31 2019-03-07 平安科技(深圳)有限公司 Method, electronic apparatus, and computer readable storage medium for generating cluster tag

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126582A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 Recommend method and device
WO2019041524A1 (en) * 2017-08-31 2019-03-07 平安科技(深圳)有限公司 Method, electronic apparatus, and computer readable storage medium for generating cluster tag
CN108062375A (en) * 2017-12-12 2018-05-22 百度在线网络技术(北京)有限公司 A kind of processing method, device, terminal and the storage medium of user's portrait
CN108288229A (en) * 2018-03-02 2018-07-17 北京邮电大学 A kind of user's portrait construction method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862289A (en) * 2021-01-29 2021-05-28 上海妙一生物科技有限公司 Information matching method and device for clinical research practitioner
CN112862289B (en) * 2021-01-29 2022-11-25 上海妙一生物科技有限公司 Information matching method and device for clinical research practitioner
CN113609379A (en) * 2021-07-12 2021-11-05 北京达佳互联信息技术有限公司 Label system construction method and device, electronic equipment and storage medium
CN113609379B (en) * 2021-07-12 2022-07-22 北京达佳互联信息技术有限公司 Label system construction method and device, electronic equipment and storage medium
CN113688607A (en) * 2021-08-02 2021-11-23 北京明略软件***有限公司 Portrait updating method and apparatus for on-line document author
CN113836905A (en) * 2021-09-24 2021-12-24 网易(杭州)网络有限公司 Theme extraction method and device, terminal and storage medium
CN113836905B (en) * 2021-09-24 2023-08-08 网易(杭州)网络有限公司 Theme extraction method, device, terminal and storage medium
CN114153716A (en) * 2022-02-08 2022-03-08 中国电子科技集团公司第五十四研究所 Real-time portrait generation method for people and nobody objects under semantic information exchange network
CN116779109A (en) * 2023-05-24 2023-09-19 纬英(广州)教育科技有限公司 Self-feature discovery method and device based on exploration scene guidance
CN116779109B (en) * 2023-05-24 2024-04-02 纬英数字科技(广州)有限公司 Self-feature discovery method and device based on exploration scene guidance

Similar Documents

Publication Publication Date Title
CN111814475A (en) User portrait construction method and device, storage medium and electronic equipment
US11194842B2 (en) Methods and systems for interacting with mobile device
CN106462598A (en) Information processing device, information processing method, and program
CN111339246A (en) Query statement template generation method, device, equipment and medium
CN111797858A (en) Model training method, behavior prediction method, device, storage medium and equipment
CN111800331A (en) Notification message pushing method and device, storage medium and electronic equipment
KR20210156283A (en) Prompt information processing apparatus and method
KR102628042B1 (en) Device and method for recommeding contact information
CN113515942A (en) Text processing method and device, computer equipment and storage medium
CN111798259A (en) Application recommendation method and device, storage medium and electronic equipment
CN115443459A (en) Messaging system with trend analysis of content
CN111797854A (en) Scene model establishing method and device, storage medium and electronic equipment
KR20210145214A (en) Context Media Filter Search
CN111026967A (en) Method, device, equipment and medium for obtaining user interest tag
CN111797302A (en) Model processing method and device, storage medium and electronic equipment
CN111796925A (en) Method and device for screening algorithm model, storage medium and electronic equipment
CN111798367A (en) Image processing method, image processing device, storage medium and electronic equipment
CN111796926A (en) Instruction execution method and device, storage medium and electronic equipment
CN111797856B (en) Modeling method and device, storage medium and electronic equipment
CN111814812A (en) Modeling method, modeling device, storage medium, electronic device and scene recognition method
CN113486260B (en) Method and device for generating interactive information, computer equipment and storage medium
CN111797867A (en) System resource optimization method and device, storage medium and electronic equipment
CN111797261A (en) Feature extraction method and device, storage medium and electronic equipment
CN111797874A (en) Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN115526602A (en) Memo reminding method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201023

WW01 Invention patent application withdrawn after publication