CN106445988A - Intelligent big data processing method and system - Google Patents
Intelligent big data processing method and system Download PDFInfo
- Publication number
- CN106445988A CN106445988A CN201610382955.7A CN201610382955A CN106445988A CN 106445988 A CN106445988 A CN 106445988A CN 201610382955 A CN201610382955 A CN 201610382955A CN 106445988 A CN106445988 A CN 106445988A
- Authority
- CN
- China
- Prior art keywords
- network
- node
- data
- big data
- application service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
- G06F16/3323—Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
- G06F16/337—Profile generation, learning or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the invention provide an intelligent big data processing method and system. The system comprises a data structured module, a representative learning module and an application algorithm module, wherein the data structured module is used for pre-processing original big data and networking the pre-processed original big data to obtain a relationship network with nodes and edges; the representative learning module is used for obtaining high-dimensional vectors of the relationship network by adoption of an embedded mapping-based representative learning algorithm; and the application algorithm module is used for obtaining an application service request of a user, determining a processing algorithm corresponding to the application service request, and determining a result of the application service request by utilizing the processing algorithm corresponding to the application service request and the high-dimensional vectors, obtained by the representative learning module, of the nodes of the relationship network. The system provided by the embodiments of the invention can effectively extract the feature information in the big data and uniformly express the feature information in a form of high-dimensional vectors, is high in calculation efficiency, high in correctness and sensitive in response to user requests, and can provide a uniform effective processing method for a plurality of application services.
Description
Technical field
The present embodiments relate to field of computer technology, more particularly, to a kind of intelligent processing method of big data and be
System.
Background technology
Just there is huge change in insurance industry, the extensive application of big data changes insurance company in fact because of scientific and technological progress
The mode now servicing.Existing insurance website and software generally have collected mass data, contain a large amount of useful informations, including
The personal information of user, consumption habit etc..Only make full use of insurance big data, could be in Risk Pricing, product design, battalion
All many-sided requirements adapting to the big data epoch such as pin strategy, customer service, risk management and control.
Currently in insurance industry, generally using Database Systems, insurance data is stored and managed.Data
In the system of storehouse, can there is substantial amounts of relation data and text message in usual data storage by the way of form in form, storage
The form of data can also be diversified.Such as, the personal brief introduction of user and the description information of product are generally in data
Stored with the form of text-string in storehouse, and the age of user and product price are generally entered in the form of non-negative numeral
Row storage.Although current data processing technique can be extracted to numerical value such as the numeral formatting and classifications and be mated,
It is that useful feature information but cannot therefrom be extracted to unstructured datas such as texts.
The product that common insurance business includes insurance data is precisely recommended, is purchased dangerous user's classification and fraud insurance fraud inspection
Survey etc..In insurance services marketing service, or being to allow user pass through search to obtain insurance products and then purchase, or using popular
The methods such as degree recommendation, correlation rule recommendation and collaborative filtering recommending actively are recommended insurance products to user.Wherein, popularity pushes away
Recommending is to show user to recommend currently most popular insurance products, and shortcoming is a lack of personalized consideration, and accuracy is low.Correlation rule pushes away
Recommending is by data analysis, learns the rule that user buys between interest and unique characteristics and product feature, such as more than 40 years old
Women be more easy to buy healthy class insurance it is recommended that accuracy also not high.Collaborative filtering recommending is based on a basic assumption, right
Hereafter the user that similar insurance products had interest can buy similar insurance products, and the product bought by similar users is hereafter also
Can be bought by similar user, this recommendation, when the behavior of sole user is little, has Sparse degree height it is impossible to be had
Effect calculates and recommends.
When carrying out purchasing dangerous user's classification, because class of subscriber can describe the habits and customs of user, make friends and be accustomed to, consume
Custom etc., different classifications needs to extract different user characteristicses.Generally using by the way of be to carry from the consumer record of user
Take the features such as user's monthly income, moon cost, the standard deviation of returns in year, the cost standard deviation in year, a large amount of by mark
Class of subscriber label, train supervised learning model, to test user classify.This method had both needed dependence experience to extract
Big measure feature, with greater need for collecting substantial amounts of flag data, can cause the problems such as cost height, poor accuracy.
Fraud insurance fraud detection, that is, judge that the Shen of certain user protects whether behavior is fraud, and most crucial task is to collect to use
Feature in declaring behavior for the family.Existing fraud insurance fraud detecting system is mainly protected from inclusion userspersonal information, institute Shen
Insurance product information, Shen are protected in procedure information etc. and are extracted substantial amounts of numerical statistic result, a portion user are carried out simultaneously
Mark, judges whether it is fraudulent user using manpower, then trains supervised learning model, protects behavior to Shen and classifies.However,
This system needs dependence experience to extract feature and collect flag data, causes effectively to implement.
As can be seen here, the intelligent processing system of existing insurance big data at least has as a drawback that:1) existing insurance
Data technique lacks the analysis to unstructured data, lost mass efficient information, the analysis result of impact insurance business;
2) existing insurance commending system, the dangerous user's categorizing system of purchase and fraud insurance fraud detecting system etc. are too dependent on the spy of manpower
Levy extraction, accuracy is low, computational efficiency is poor, slow to user's request response, affect Consumer's Experience;3) different insurance services
Generally adopt different data processings and feature extracting method, cause substantial amounts of redundant data to process, and the number of different service
Feature according to unit is not compatible.
Content of the invention
The purpose of the embodiment of the present invention is to provide a kind of intelligent processing method of big data and system, can be from multiple big
Efficiently extract characteristic information in data source, need not manually participate in, and computational efficiency is high, accuracy is high, user's request is rung
Should be sensitive, unified processing method effectively can be provided for multiple application services.
The technical scheme that the embodiment of the present invention adopts is as follows:
A kind of intelligent processing system of big data of embodiment of the present invention system, this system include data structured module,
Representative learning module and application algoritic module;
Wherein, described data structured mould, for pre-processing to original big data, and to described pretreated
Original big data carries out networking, obtains comprising the relational network on node and side;
Described representative learning module is used for described relational network using the representative learning algorithm based on embedded mapping, obtains
The high dimension vector of the node of described relational network;
Described application algoritic module is used for obtaining the application service request of user;Determine that described application service request is corresponding
Processing Algorithm, and ask, using described application service, the described pass that corresponding Processing Algorithm and described representative learning module obtain
It is the high dimension vector of the node of network, determine the result of described application service request.
Alternatively, comprise Multidimensional Relation network in described relational network, then described representative learning module is specifically for institute
State Multidimensional Relation network and carry out embedded mapping, obtain the high dimension vector of the node of described Multidimensional Relation network.
Alternatively, comprise semantic network in described relational network, then described representative learning module is specifically for institute's predicate
Adopted network carries out embedded mapping, obtains the high dimension vector of the node of described semantic network.
Alternatively, described data structured module is specifically for the behavior number in described pretreated original big data
According to carrying out networking, obtain comprising the behavior network on node and side;
Networking is carried out to the attribute data in described pretreated original big data, obtains comprising the genus on node and side
Property network;And,
Networking is carried out to the text data in described pretreated original big data, obtains comprising the language on node and side
Adopted network;
Wherein, described behavior network, described net with attributes and described semantic network have collectively constituted described relational network.
The embodiment of the present invention additionally provides a kind of intelligent processing method of big data, and the method includes:
Original big data is pre-processed;
Networking is carried out to described pretreated original big data, obtains comprising the relational network on node and side;
To described relational network using the representative learning algorithm based on embedded mapping, obtain the node of described relational network
High dimension vector;
Obtain the application service request of user;
Determine that corresponding Processing Algorithm is asked in described application service;
Ask the high dimension vector of the node of corresponding Processing Algorithm and described relational network using described application service, determine
The result of described application service request.
Alternatively, comprise Multidimensional Relation network in described relational network, then described to described relational network using based on embedding
Enter the representative learning algorithm of mapping, obtain the high dimension vector of the node of described relational network, including:To described Multidimensional Relation network
Carry out embedded mapping, obtain the high dimension vector of the node of described Multidimensional Relation network.
Alternatively, comprise semantic network in described relational network, then described described relational network is reflected using based on embedded
The representative learning algorithm penetrated, obtains the high dimension vector of the node of described relational network, including:Described semantic network is embedded
Mapping, obtains the high dimension vector of the node of described semantic network.
Alternatively, described networking is carried out to described pretreated original big data, obtain comprising the pass on node and side
It is network, including:Networking is carried out to the behavioral data in described pretreated original big data, obtains comprising node and side
Behavior network;
Networking is carried out to the attribute data in described pretreated original big data, obtains comprising the genus on node and side
Property network;And,
Networking is carried out to the text data in described pretreated original big data, obtains comprising the language on node and side
Adopted network;
Described behavior network, described net with attributes and described semantic network have collectively constituted described relational network.
The embodiment of the present invention additionally provides a kind of intelligent processing method of big data, including:
Obtain the application service request of user and the higher-dimension of the node of relational network being transformed by original big data
Vector;
Determine that corresponding Processing Algorithm is asked in described application service;
Ask the high dimension vector of the node of corresponding Processing Algorithm and described relational network using described application service, determine
The result of described application service request.
Alternatively, described by the relational network that original big data is transformed it is:By described original big data through pre-
Carry out the relational network obtained by networking after reason.
The technical scheme of the embodiment of the present invention has advantages below:Described data structured module can be to original big data
Pre-processed and networking is so that described original big data is converted into network data or structured data, thus described table
Levy the representative learning algorithm that study module can utilize network data, to realize quick, the unified feature extraction to data, and
It is indicated in the form of high dimension vector;Described application algoritic module can be asked according to the application service of user, determine and correspond to
Processing Algorithm, and calculated using the feature representing in the form of vectors that described representative learning module is extracted, at determination
Reason result.Different from prior art, in the embodiment of the present invention, the process of whole feature extraction, without the participation of people, utilizes based on embedding
The representative learning algorithm entering mapping is automatically performed, and computational efficiency is high;Also greatly remain original big during feature extraction
Structural information (i.e. effective information) in data, thus improve the accuracy task such as being classified or being predicted;Moreover,
Due to employing the representative learning algorithm based on embedded mapping so that the data characteristics system excavating from original big data is permissible
Unification is indicated by the form of high dimension vector, thus the system in the embodiment of the present invention is not limited only to specifically apply for certain
Service, can provide unified processing method effectively for multiple application services.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description are these
Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also root
Obtain other accompanying drawings according to these accompanying drawings.
Fig. 1 is a kind of flow chart of the intelligent processing method of big data provided in an embodiment of the present invention;
Fig. 2 is a kind of structural representation of behavior network;
Fig. 3 is the flow chart of the intelligent processing method of another big data provided in an embodiment of the present invention;
Fig. 4 is a kind of structure composition schematic diagram of the intelligent processing system of big data provided in an embodiment of the present invention;
Fig. 5 is the structure composition schematic diagram of the intelligent processing system of another big data provided in an embodiment of the present invention.
Specific embodiment
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described it is clear that described embodiment is
The a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment being obtained under the premise of not making creative work, broadly falls into the scope of protection of the invention.
In order to preferably explain the embodiment of the present invention, before the embodiment of the present invention is described, related notion is entered
Row is explained.
Inseparable elementary cell, such as certain " client or user " when data cell refers to represent relation data, certain
Individual " age bracket ", some " product ", a certain " product classification " etc..These elementary cells have entity in life.With
What data cell was relative is non-data cell, refers to that customer relationship, client belong to certain group such as a series of together to the behavior of product, product
Become the structure of these data cells.
Behavioral data refers to user to data produced by product generation behavior, and such as user buys, quits the subscription of or evaluate certain
Insurance products and the data that produces.Behavioral data describes the relation between two or more data cells, generally describes
Relation between " user " and " product ".
Attribute data refers to the relation between the data cells such as user, product and its attribute, the age of such as user, product
Species etc..Attribute data describes the relation of data cell and its attribute, generally describes " user " and its attribute, or
Relation between " product " and its attribute ".
Text data refers to the text containing vocabulary or phrase.Can be using vocabulary or phrase as data cell.
Structural data is the data referring to be represented with data or unified structure, such as numeral or symbol, deposits
Storage can be realized come logical expression with bivariate table structure in lane database.
Unstructured data, for structural data, referring to cannot be with the number of digital or unified representation
According to it has not been convenient to be showed with database two dimension logical table, such as text, image, sound, webpage, all kinds of form etc..
Multidimensional Relation refers to that this relation involves multiple data cells (or referring to the multiple nodes in network), is many numbers
Interaction according to unit.Two-dimentional relation is the interaction of only two data cells.Buying behavior is high in the case that information is plentiful
The behavior of dimension relation, generally potentially includes user, product, buys place and buying pattern etc., but if information is not exclusively,
It is possible to the behavior of simply two-dimentional relation, such as contain only user and product.Legacy data processing system is only it is contemplated that two
The behavior of dimension relation, but the behavior of Multidimensional Relation cannot be processed.And the data of the Multidimensional Relation that the behavior of Multidimensional Relation produces exists
It is generally existing in current every field.
Additionally, the development with network technology is so that the quantity of unstructured data increases increasingly.At this moment, only can be right
The limitation of the data handling system that structural data is managed and analyzes is exposed to more and more obvious.Moreover, very
In conglomerate, it is not limited only in insurance, the feature extraction to big data remains a need for using expert it is impossible to only lean on computer complete
Become.The system that big data is processed also generally existing accuracy is low, computational efficiency is poor, and slow to user's request response etc. one
Series of problems.
In order to solve the above problems, embodiments provide a kind of intelligent processing method of big data, as Fig. 1 institute
Show, methods described includes:
S101:Original big data is pre-processed.
Original big data can be collected and come by each website or application program (Application, APP),
Thus potentially include the structured datas such as behavioral data, attribute data it is also possible to including text data etc. unstructured data, and
The form of data is also likely to be diversified.Therefore, before data being extracted with feature or utilizes serving data, can
First to pre-process to original big data.The method of data prediction includes data scrubbing, data integration, data conversion, number
According to analysis data reduction etc..
Alternatively, in embodiments of the present invention, carrying out pretreatment to original big data can be, to described original big data
Carry out data analysis and cleaning, original big data is carried out with statistical analysis, remove the data content not conforming to rule or mistake, permissible
It is that invalid data form is filtered, such as removal ought to be floating number, but be filled to the numerical value such as the price of character string type, also
Can be are carried out time or unit unified or to disappearance fingering row fill in, smooth noise data etc., thus can
By the standardized format of big data, to remove abnormal data, correct mistake or remove repeated data etc..
S102:Networking is carried out to described pretreated original big data, obtains comprising the relational network on node and side.
Node in described relational network, is by the data cell conversion in described pretreated original big data
Come, the side in described relational network, for representing the relation between described nodes and node.
Big data is typically stored in table form, but this traditional data storage method is it is impossible to logarithm
Uniformly store on a large scale according to carrying out and manage, and contained semantic information (this language in substantial amounts of text data can be lost
Adopted information is useful information, and for providing a user with, accurate application service is most important) it is most important that, the table of fragmentation
Lattice storage mode is it is impossible to conveniently and efficiently being conducted interviews by subsequent applications service and realizing frequency height, response using it is impossible to meet
The demand of fireballing application service.
In the embodiment of the present invention, by networking is carried out to original big data, can be by the big data in form or magnanimity
Data is converted into relational network, efficiently solves the problems referred to above.First, by after pretreated original big data networking,
These data can be uniformly processed in the way of using node and side, data storage and the cost of management are greatly reduced.Secondly, pin
To text datas such as the vocabulary in pretreated original big data and phrases, carried out networking, constructed semantic network,
Remain the semantic information in text, so that the accuracy of application service subsequently with effectively utilizes, can be improved.Additionally, will locate in advance
It is possible to utilize the representative learning of network data after original big data after reason is expressed as comprising node and the relational network on side
Algorithm, to realize quick, the unified feature extraction to data, thus accomplishing quick response different application service request.
Alternatively, described pretreated original big data can include behavioral data, attribute data and text data, then
Carry out networking to described pretreated original big data can include:To the row in described pretreated original big data
Carry out networking for data, for example, be converted into behavior network by the behavioral data such as buying, evaluating;Or, can also include to institute
The attribute data stated in pretreated original big data carries out networking, for example, be converted into the attribute informations such as age, price
Net with attributes;Or can also include carrying out networking to the text data in described pretreated original big data,
For example the text datas such as product introduction or evaluation content are converted into the semantic network with word and phrase as node.Then described row
Collectively constitute described relational network for network, described net with attributes and described semantic network.
S103:To described relational network using the representative learning algorithm based on embedded mapping, obtain described relational network
The high dimension vector of node.
Representative learning is one of the studying a question of core during machine learning data is excavated.In embodiments of the present invention, lead to
Cross to described relational network using the representative learning algorithm based on embedded mapping, the node in described relational network is for example used
Family, product and phrase etc., unification is indicated with the higher vector of dimension, and remains the structure letter in original big data
Breath.Wherein, each vector can represent one of described relational network node, and one of this vector dimension illustrates this section
One feature of point.Relation (side in other words) between described relational network interior joint and node, be converted into the higher-dimension of node to
, if there is relation between node 1 and node 2 (i.e. in described relational network in the similarity between amount and the high dimension vector of node
In connected by side), then similarity between the high dimension vector of the high dimension vector of node 1 and node 2 is high, conversely, then similarity
Low.
By way of above-mentioned representative learning, it is to avoid depend on the manual features extraction side of expertise in prior art
Formula it is achieved that the feature of data rule is met for obtained from drive with big data, and after feature represents in vector form,
So that subsequently may be directly applied to multiple-task, including classification, cluster, prediction etc..
Further, using the representative learning algorithm based on embedded mapping, described relational network can be retained as much as possible
In structural information, and different structural informations can be retained for different networks.For example, for " user-product "
Behavior network, can retain buying behavior information so that the user that in vector, similar features represent has similar buying habit,
The product that similar features represent has similar purchase crowd, and 50 dimensional vectors in high dimension vector such as can be selected to preserve
So that there are two nodes (user and the products of " buying behavior relation " this structure in " buying behavior relation " this structural information
Product) vector similarity between corresponding high dimension vector is high, and 50 other dimensional vectors in high dimension vector can also be selected to protect
Deposit " similar purchase intention " this object information so that there is this structure of two nodes of " similar purchase intention " this structure
Two nodes (user and user) corresponding high dimension vector between vector similarity high.It follows that this will carry significantly
Rise the accuracy of the tasks such as the corresponding classification of later stage application service and prediction, solve and in prior art, cannot effectively extract data
In structural information, lost the problem of mass efficient information.
Additionally, common learning method is to be represented using the higher-dimension that matrix or tensor resolution obtain node, but this kind of side
Method often faces the problem of complexity too high (cube rank) it is impossible to be widely used in the industrialization scene of mass data, and
And computational efficiency is not also high.And in embodiments of the present invention, using school's method of embedded mapping, the process employs negative sampling
Technology (Negative Sampling), samples study for mass data, thus ensure that study engineering with carrying out rational proportion
Preferable learning outcome can be reached with the less time.And by described relational network high dimension vector is indicated it
Afterwards, the time of study not only can be shortened, computational efficiency, the request of quick response user can also be greatly improved.
The realization of representative learning algorithm, except mapping using based on embedded, also other modes, such as singular value decomposition,
Non-negative Matrix Factorization etc., but this these method is only limitted to two-dimentional relation network, and calculating speed is also very slow.The present invention
It is contemplated that in the current either application scenarios of insurance industry, financial industry, shopping and electric business etc., collecting in embodiment
Big data increasingly tend to variation, using the relational network obtaining after the technical finesse of the embodiment of the present invention, often not only
It is limited to two-dimentional relation network, be Multidimensional Relation network in most cases.The scale of data is often also suitable big, therefore
From the representative learning algorithm based on embedded mapping, can be applied not only to two-dimentional relation network and multi-dimensional relation network, and
The acceleration of calculating speed can be realized, greatly shorten calculating time, quick response application demand.
Specifically, can realize protecting knot using " state is penetrated " in category theory using the representative learning algorithm of " embedded mapping "
The dimensionality reduction " embedding " of structure mapping is realizing representative learning.It is directed to the data in described relational network, by retaining described relation
The learning algorithm of the structural information in network, the high dimension vector obtaining node represents.
S104:Obtain the application service request of user.
User is browsing webpage, using certain APP, or situations such as click on certain function button of certain operation interface
Under, it is likely to trigger application service request, therefore can obtain this application service request, to determine the phase that subsequently should adopt
Close algorithm.
S105:Determine that corresponding Processing Algorithm is asked in described application service.
S106:Asked using described application service corresponding Processing Algorithm and described relational network node higher-dimension to
Amount, determines the result of described application service request.
The service definition of application layer can be sequence, classification, cluster, prediction, the task such as association analysis and abnormality detection,
These tasks can be completed with specific Processing Algorithm, according to obtained high dimension vector after representative learning, using above-mentioned task
Corresponding Processing Algorithm (i.e. corresponding Processing Algorithm is asked in application service) it is possible to obtain accurately and efficiently solution, and
Return to user.
Specifically, can preassign or obtain the corresponding relation between described application service request and Processing Algorithm,
For example when application service request is Products Show, it is known that recommended products actual be exactly to be predicted, predict the use that obtains
A series of products that family most probable is bought, Processing Algorithm calculates the high dimension vector of user node and the high dimension vector of product node
Similarity degree, then if preassigning or obtaining this application service request and the corresponding relation of this Processing Algorithm, then
It is possible to determining that corresponding Processing Algorithm is asked in described application service is to calculate user node after receiving this application service request
The high dimension vector of high dimension vector and product node similarity degree.Finally, using high dimension vector and the product section of user node
The high dimension vector of point, carries out Similarity Measure it is possible to obtain and some row products of user's similarity highest, that is, obtain described
The result of application service request.
In embodiments of the present invention, networking is carried out by pretreated original big data, obtain comprising node and side
Relational network, and to described relational network using representative learning algorithm based on embedded mapping, obtain described relational network
The high dimension vector of node, that is, achieve the feature extraction to original big data, and whole process need not rely on the experience of expert,
Without the participation of people, it is automatically performed using the representative learning algorithm based on embedded mapping, computational efficiency is high.Different from existing skill
Art, also greatly remains effective information in embodiments of the present invention during feature extraction, thus improve follow-up dividing
The accuracy of the task such as class or prediction.Further, in embodiments of the present invention, because the feature unification of data is by high dimension vector
Form be indicated, so that can ask according to application service, determining Processing Algorithm, thus carrying out using with high dimension vector
The feature representing, to determine the result of described application service request, the Intelligent treatment side of the big data described in the embodiment of the present invention
Method, is not limited only to certain specific application service, can provide unified processing method effectively for multiple application services.
It should be noted that the intelligent processing method of the big data described in the embodiment of the present invention, can be applied not only to
Insurance field, can also be applied to other field, for example, be applied to financial field, purchase and consumption field etc., be particularly suited for
To the situation comprising structured data and non-structural data is processed, and need to process the occasion of the data of Multidimensional Relation, relatively
Prior art will have obvious advantage.
It should be noted that in S106, using the high dimension vector of the node of described relational network, determining described application service
During the result of request, it is possible to use the high dimension vector of all nodes of described relational network goes to determine described application service request
Result;The high dimension vector of the part of described relational network can also only be utilized, go to determine the result of described application service request.Tool
Body ground, can only go to determine the result of described application service request using the node related to described application service request.For example,
When application service request is Products Show, can only be entered using the high dimension vector of product node and the high dimension vector of user node
Row calculates.
Alternatively, in step s 102, specifically how this carry out networking, carry out network to text data to behavioral data
Change or networking is carried out to attribute data, be referred in the following manner.
1st, networking is carried out to the behavioral data in described pretreated original big data
Specifically, behavioral data describes the relation between two or more data cells, carries out network to behavioral data
Change and refer to this relation is expressed as the side of network, data cell is expressed as the node of network.This network can be two-dimentional relation net
Network, can also be Multidimensional Relation network, correspondingly, behavioral data is carried out relation can be expressed as during networking the side of two dimension
Or the side of higher-dimension.Will buy, quit the subscription of or evaluate etc. behavior representation be network side.Wherein, two dimension when referring on contain
Have two nodes, higher-dimension when referring on contain multiple nodes.
For example:The behavioral data of simple user can be expressed as the two-dimentional relation form of " user-product ".This
Outward, user behavior may also have abundant contextual information, can will form n-tuple relation figure after contextual information node,
Three-dimensional relationship figure as " user-product-evaluation ".Taking the behavior that Mr. Zhang is bought to insurance products A as a example, Mr. Zhang purchases
Buy this and give being evaluated as of this insurance products A:Although price is expensive, but be worth.Behavioral data is carried out to above-mentioned data
Networking can obtain behavior network as shown in Figure 2.In fig. 2, " Mr. Zhang " and " insurance products A " is expressed as the behavior
The node of network, buying behavior constitutes the side between above-mentioned two node.Additionally, the phrase evaluated or word " expensive "
And " worth ", it is expressed as the node of network, this part in fact belongs to and carries out networking to text data, by retouching below
It is explained in detail in stating.Thus form the behavior network of " user-product-evaluation ", namely three-dimensional relationship network.
2nd, networking is carried out to the text data in described pretreated original big data
Text data is carried out with networking is exactly the node that the data cell of vocabulary or phrase composition is expressed as network, from
And text is built into the relational network with vocabulary or phrase as node.Between the node being formed with vocabulary or phrase in network
Side, describes them and occurs in the frequency in sentence or document.For example, if " expensive " and " worth " this two phrases are common
Go out among 3 sentences, then " expensive " and " worth " can be able to deposit between them as the two of relational network node
It is attached on side, the weight on side could be arranged to 3;If " expensive " and " very cheap " never goes out in sentence jointly in network
Existing, then there is not side between this two nodes and be attached.In addition, these are with the node of vocabulary or phrase composition and other nodes
The side of formation such as (as user, products), belongs to behavioral data, describes the relation between two or more data cells.
So that above-mentioned Mr. Zhang is bought and evaluates to insurance products A as a example, the text datas such as evaluation content can be entered
Row structuring, that is, carry out participle, phrase extraction, classification mark, sentiment analysis etc., thus natural language is stated as and can be located
The data structure of reason.Specifically, according to " although price is expensive, but be worth ", " expensive " and " worth " can be known
It is core vocabulary, and " expensive " describes the feature in " price " aspect for the product, " worth " reflects the positive purchase of user
Buy phychology and emotion.Thus when networking is carried out to this article notebook data, " expensive " and " worth " is expressed as the section of network
Point, this two nodes and other nodes, the such as side of user and product formation, belong to behavioral data.
It follows that networking is carried out to text data, not only achieve the analysis to unstructured data, and permissible
Vocabulary or phrase etc. and behavioral data are associated, remain certain useful information.
3rd, networking is carried out to the attribute data in described pretreated original big data
Attribute data describes the relation of data cell and its attribute, carries out networking to attribute data and refers to this relation
It is expressed as the side of network, data cell is expressed as the node of network.Attribute data both can be classification information, such as health insurance
Or travel accident insurance, can also be the numerical informations such as age or price.Thus networking is carried out to attribute data, can be by class
Other information is expressed as the node of network, and the numerical information in the attribute informations such as age, price is carried out, behind by stages, carrying out node
Change and represent.
For example, the age is the Mr. Zhang of 25 years old, have purchased the insurance products that price is 2000.In this example, permissible
Certain age range comprising 25 years old is expressed as node, such as can by the age 24-30 year between youth be expressed as node
" between twenty and fifty ";Certain price range comprising the numerical value that price is 2000 can be expressed as node, such as by price in 1000-
It is expressed as node " entry level insurance products " between 5000.After above-mentioned process, be eventually converted into " user-age level " and
The net with attributes of " product-price range ".
Alternatively, after networking being carried out to described pretreated original big data, can be to described relational network
The regular Mass storage of row format and management are entered, to facilitate follow-up feature extraction and use in node and side.Therefore, in S102
Afterwards, can also include:
S102’:The node of described relational network and side are saved in database.
For example, two kinds of forms can be stored in described database in order to preserve node and the side of described relational network respectively,
In the form of preservation nodal information, often row is ID, title and inquiry frequency of node etc..Preserve often going in the form of side information
It is the ID on side, the ID of interdependent node and generation time etc..After networking is carried out to described pretreated original big data, real
On border, the data of all of networking before processing is all changed into structural data.In actual applications, for structuring number
According to being managed (Structured Data Management), there is several data management technique, such as distributed storage,
Cloud database, NOSQL database (non-relational database) and move database etc..For example BaseX, MongoDB and No2DB are
Java, C++ and C# language is relied on to be developed into popular three kind NO-SQL database respectively;MySQL and HBase is frequently-used data
Library software;Cyberrelationship storage in AllegroGraph, DEX, Neo4j and FlockDB be rely on SPARQL, Java and
The graphic data base of Scala.
Alternatively, when realizing step S103, because described relational network both may expand semantic network it is also possible to include by bag
Net with attributes and behavior network.They both may belong to isomorphic relations network, be likely to belong to two-dimentional relation network it is also possible to belong to
In Multidimensional Relation network.Therefore, to described relational network using the representative learning algorithm based on embedded mapping, obtain described relation
The high dimension vector of the node of network can include:Multidimensional Relation network in described relational network is carried out with embedded mapping, obtains
The high dimension vector of the node of described Multidimensional Relation network;Or, the two-dimentional relation network in described relational network is embedded
Mapping, obtains the high dimension vector of the node of described two-dimentional relation network;Or, the semantic network in described relational network is carried out
Embedded mapping, obtains the high dimension vector of the node of described semantic network;Or the homogeneous network in described relational network is carried out
The embedded mapping of row, obtains the high dimension vector of the node of described homogeneous network.
First, described semantic network is carried out with embedded mapping (Text Embedding)
Using the method for embedded mapping, the node of the word in semantic network and phrase form is expressed as high dimension vector, and
And after embedded mapping so that represent in node the node of close word or phrase high dimension vector similarity very high, that is,
Close word is made to have similar semanteme to phrase.
Specifically, mapping method can be embedded by the word based on Skip-gram model, by learning the vector representation of word, come
Reach the purpose that accurate prediction closes on word.Most effectively learning objective (i.e. maximized object function) is:It is hidden in sentence
In after certain word, by other words closing in given sentence, the vector of the optimal word being hidden can be obtained.?
Under natural voice, can be filled into, between the hiding word of word place vacancy, there is similar semantic, then embedded
So that the similarity of their vector is very high during mapping.
In brief, the object function that the embedded mapping of semantic network maximizes conditional probability is given neighbor node (phase
The node connecting) vector, the vector of prediction destination node is so that have between the node that is connected with some given nodes
Similar vector.Can also be expanded further, incorporate the multiple elements such as word, phrase and phrase categories, realize semantic level
Representative learning.
Select scale c of the contextual information of text of training, namely window size, by current word wtAs input, will
The identical element closed on as the maximized object function of the training pattern of output layer is:
Wherein, wiRefer to i-th word in text.
By this object function of this maximization, study obtains the vector representation w of each word(i)So that given vector
w(t)During with position t, learn this object function and can be obtained by the vector meeting of position (t+j) and the word of this position in actual document
Vector similarity very high (probability is maximized) so that close word has similar semanteme to phrase, allow the language of word
Justice can be retained.
For example, " today ", " noon ", " eating " these words of closing on occur in semantic network, may be from original
Text message " this noon has eaten rice " in big data and " this noon has eaten plain rice ".Side using the embodiment of the present invention
Method, now " plain rice ", the vector of " rice " are exactly w(t), " today ", " noon ", the vector of " eating " are exactly w(t+j), that is,
w(t-3),w(t-2),w(t-1), by the representative learning algorithm based on embedded mapping, obtain " plain rice " and " rice " corresponding vector
Similarity is very high, and that is, " plain rice " and " rice " this two languages or phrase have similar semantic.And conventionally, as
" plain rice " and " rice " is two different terms then it is assumed that " plain rice " and " rice " is different it is impossible to retain semantic information.
2nd, described two-dimentional relation network is carried out with embedded mapping (Bipartite Network Embedding)
Two-dimentional relation network refers to that the node that in network, every a line all corresponds in two nodes, and network only has two
Class, such as " user-product " are exactly a kind of two-dimentional relation network.
Described two-dimentional relation network is carried out with embedded mapping and refers to, using the embedded method mapping, will there is two-dimentional relation
Node in the behavior network and net with attributes of (as user-product, user-age, product-price etc.) (as user, product,
The nodes such as age level, price layer) it is expressed as high dimension vector.
As the embedded mapping of semantic network, the embedded mapping of two-dimentional relation network, maximize the target of conditional probability
Function is the vector of given neighbor node (node being connected), the vector of prediction destination node so that with some given sections
Point vjThe node v being connectediBetween there is similar vector.
Assume to contain A class node and B class node in two-dimentional relation network.Then pass through this object function of this maximization, permissible
In given B class node vjWhen, draw and vjThe vector of the node being connected, can be with A class node viVector similar, i.e. condition
Maximization.
Can define by the v in B class nodejThe v of A class node can be producediRepresent conditional probability be:
Wherein uiIt is viHigh dimension vector, ujIt is vjHigh dimension vector.
It is assumed that A class node represents user, B class node represents product taking the two-dimentional relation network that " user-product " forms as a example
Product, then by the way, can predict which user may buy in the case of giving certain product, or
Say that it is how many for can be calculated user buying the probability of this product.
For example, after carrying out networking to data, there is two-dimentional relation network is:User's A- products C, user A- produces
Product D, user's B- products C.So object function is:During given " products C " node, by change (study) " user A " node with
" user B " corresponding vector of node, the vector of all nodes that transference " products C " node is connected both with " user A " node
Vectorial similar and similar to the vector of " user B " node, the then vectorial phase of vector sum " user B " node of " user A " node
Seemingly.By the way, successfully save the structural information in network, greatly improve the accurate of the corresponding problem of follow-up solution
Property.
3rd, described Multidimensional Relation network is carried out with embedded mapping (Tensor Network Embedding)
Multidimensional Relation network refers to have side to be corresponding three nodes in network, such as " the user-product-comment shown in Fig. 2
Valency " network belongs to Multidimensional Relation network.Multidimensional Relation (High-order Relation) is also common in data, such as comments
Valency behavior is related to user, product simultaneously and evaluates text, so that non-matrix, ternary relation rather than simple two with tensor
Portion's figure is representing such behavioral data.
Described Multidimensional Relation network is carried out with embedded mapping and refers to, using the embedded method mapping, will there is Multidimensional Relation
Node in the behavior network and net with attributes of (as user-product-evaluation) is expressed as high dimension vector.
As the embedded mapping of semantic network, the embedded mapping of Multidimensional Relation network, maximize the target of conditional probability
Function is the vector of given neighbor node (node being connected), the vector of prediction destination node so that with some given sections
Between the node that point is connected, there is similar vector.
Realize the embedded mapping of Multidimensional Relation network, need to update object function, can have two kinds of processing methods.A kind of
It is n-tuple relation of every sampling, update the vector representation of associated nodes, then maximized object function is as follows:
Wherein, S is the set of node, A(j)Refer to the Multidimensional Relation set being associated with j node, r(m/j)Refer to therein
One Multidimensional Relation, m is the numbering of this Multidimensional Relation, λm,/jIt is the weight of this Multidimensional Relation, P1It is to give this Multidimensional Relation when institute
The probability of associated nodes, L1For each node j, maximize Multidimensional Relation interior joint associated by it between any two vector
Similarity.
When another kind is sampling n-tuple relation, split into several binary crelations, and update the vector representation of associated nodes,
Maximize object function as follows:
Wherein,It is the set that Multidimensional Relation splits into all two-dimentional relations after multiple two-dimentional relations, rmIt is m-th two dimension
Relation, λmIt is the weight of m-th two-dimentional relation, P2 is the probability of associated node when giving this Multidimensional Relation, L2It is for each
Two-dimentional relation after individual fractionation, maximizes vector similarity between two nodes of this relation.
As an example it is assumed that data is carried out with the Multidimensional Relation network after networking being:User A- products C-purchase ground
Point E, user A- products C-purchase place F, user B- products C-purchase place E.
After so object function is exactly given " products C " node and " buying ground E " node, it is associated (passing through
Side be connected) node vector similar, thus allow " user A " node vector sum " user B " node vector similar.Certainly,
We may travel through each given information, after such as given " user A " node and " products C " node, allows " buying place E " to save
Point is similar with " buying place F " corresponding vector of node.
If adopting maximum target function L1, that is, give certain relation other nodes (such as products C and purchase E), allow
A node being hidden is learnt (as user node).
If adopting maximum target function L2, Multidimensional Relation is split into A-C, C-E, A-E, A-C, A-F, C-F etc. 9
Two-dimentional relation, then calls the embedded Mapping implementation of two-dimentional relation.
Semantic network is carried out with embedded mapping, two-dimensional network is carried out with embedded mapping and higher-dimension network carried out by above-mentioned
Embedded mapping understands, by described relational network is adopted with the representative learning algorithm based on embedded mapping, can be by relational network
Node unification be indicated with the higher vector of dimension, each dimension of vector represents the feature of this node, realizes
The feature extraction of original big data.And due in high dimension vector and remain structural information in original big data, such as semantic
Information, buying behavior information etc., greatly promote the accuracy of the tasks such as the corresponding classification of later stage application service and prediction.And this
The representative learning algorithm based on embedded mapping in inventive embodiments, can also be applied to the data of Multidimensional Relation it is adaptable to each
Plant complicated applied environment, and calculating speed is quickly, can be with quick response application demand.
Alternatively, when realizing step S105-S106, can by application service request be converted into sequence, classification, cluster,
The task such as prediction, association analysis and abnormality detection, these tasks can be completed with specific Processing Algorithm, can preassign or
The corresponding relation that person obtains between these task and Processing Algorithm is (i.e. corresponding between described application service request and Processing Algorithm
Relation), thus when getting application service request, it is known that adopted which kind of Processing Algorithm.In order to be better understood from this
How bright embodiment, it is thus understood that these tasks are corresponding with which kind of Processing Algorithm, is completed with Processing Algorithm, the embodiment of the present invention will
Related content is done with detailed introduction.
1st, sort (Ranking) task
Sorting task is often based upon certain specific similarity and realizes, and generally involves the phase of the node of described relational network
Calculate like degree, including Pearson's degree of association (Pearson Correlation) and cosine similarity (Cosine Similarity)
Deng.
For example, when application service request needs the problem solving it is, certain product given, list therewith purchased
During the most like product of aspect, this problem can be converted into Sorting task.
Processing Algorithm:We can find the height of this product node by executing in the high dimension vector that S101-S103 obtains
Dimensional vector ui, then problem be converted into and obtain and uiA series of similarity highest product node.Because each product node has
A high dimension vector is had to represent, usually K dimension (K is usually the numeral between 200 to 500), thus can be by seeking vector
Scalar product is obtaining the similarity between node.This problem final is converted into be asked and vectorial uiOn scalar product, maximum is a series of
Vector.It is achieved that Sorting task has obtained the result of application service request in other words by above-mentioned algorithm.
2nd, classification (Classification) task
Classification task includes two classification and many classification, and SVM (Support Vector Machine) and logic are returned
Supervised learning algorithms such as (Logistic Regression) is returned to can effectively solve the problem that classification task;
For example, application service request needs the problem solving may be given a large number of users, interval according to age level, income
Determine class of subscriber etc. information.But in practical application, often existence information disappearance in data, how by unknown age, receipts
The user information such as entering is categorized into correct age level and income is interval, is an important problem.This problem can be converted
For classification task.
Processing Algorithm:The high dimension vector of the nodes such as user, age level, income interval can be obtained by representative learning, that
Only need to calculate the similarity with the high dimension vector of age node layer for the high dimension vector of user node, and calculate user node
High dimension vector and the similarity of the high dimension vector of the interval node of income, choose the high dimension vector similarity highest with user node
Age node layer and the interval node of income.Just this user can be categorized into correct age level and income is interval.
3rd, cluster (Clustering) task
Cluster task is often completed with unsupervised-learning algorithm such as arest neighbors, spectral clusterings.
For example, the problem of application service request needs solution may be:Given a large number of users, in the situation of unknown classification
Under, user is polymerized to K class according to buying behavior custom, same strategy can be formulated to same class user.Can will be somebody's turn to do
Problem is converted into cluster task.
Processing Algorithm:Can be represented according to the high dimensional feature of user, the algorithm using K-means or KNN is quickly realized
Cluster.The difficult point of generally clustering problem is how to reduce the dimension of structured message, and this dimension is up to the quantity of user, that is, save
Quantity N of point, but dimension is successfully reduced to K by embedded mapping.
4th, predict (Prediction) task
Prediction task typically utilizes matrix decomposition (Matrix Factorization) or tensor resolution (Tensor
Factorization), realize the filling to matrix and higher-dimension tensor, thus the missing values (Missing in prediction data
Value).
For example, the problem of application service request needs solution may be:Predict whether certain user can buy certain in the future
Product.It is true that recommendation problem can be converted into forecasting problem, that is, provide a series of of user's most probable purchase that prediction obtains
Product.
Processing Algorithm:We can obtain given user node high dimension vector by the method described in the embodiment of the present invention
With the high dimension vector of product node, user node high dimension vector is given by calculating similar to the high dimension vector of product node
Degree, can by with user node similarity highest Products Show to this given user.
5th, association analysis (Correlation Analysis) task
Application service request need solve problem may be:Judge age level, the interval valency with product of income of user
Whether lattice interval is relevant.
Processing Algorithm:By the method described in the embodiment of the present invention, can obtain age node layer, the interval node of income and
The high dimension vector of price range node, thus by the quick similarity calculating between them it is possible to understand different user
Incidence relation between attribute (age level of user and income) and product attribute (price range of product) and the intensity associating.
6th, abnormality detection (Outlier Detection) task
Application service request need solve problem may be:Judge that whether certain user is different in its customer group of being located
Conventional family, such as fraudulent user etc..
Processing Algorithm:By the method described in the embodiment of the present invention, the high dimension vector of all user nodes can be obtained, lead to
Cross the similarity between the high dimension vector calculating active user's node and the high dimension vector of other users node, if similarity is very
Greatly it is believed that active user is abnormal user.
Alternatively, after execution step S101-S103, that is, complete and data mining is carried out to original big data, obtain
After the data characteristics that unified high dimension vector represents, if original big data has renewal, can be only to the data execution updating
Step S101-S103 it is not necessary to execute a S101-S103 more again to all data.
It is alternatively possible to be in the case that data has renewal, just right to realize to new data execution step S101-S103
The data mining of new data or the just execution when new data accumulated is to some, or can be periodically to new data
Execution step S101-S103.
The embodiment of the present invention additionally provides a kind of intelligent processing method of big data, as shown in Fig. 2 the method includes:
S301:The application service request that obtains user and the node of relational network that is transformed by original big data
High dimension vector.
Alternatively, described by the relational network that original big data is transformed it is:By described original big data through pre-
Carry out the relational network obtained by networking after reason.
In embodiments of the present invention, the feature that can be represented with direct access high dimension vector, from without using original
Big data carries out feature mining.Can be completed on other devices using the process that original big data carries out feature mining,
This is not restricted for the embodiment of the present invention.The process of the feature mining being excavated using original big data may be referred to S101-
S103, the embodiment of the present invention will not be described here.
S302:Determine that corresponding Processing Algorithm is asked in described application service.
S303:Asked using described application service corresponding Processing Algorithm and described relational network node higher-dimension to
Amount, determines the result of described application service request.
The specific implementation of S302 and S303 may be referred to S105-S106.
In embodiments of the present invention, the feature that direct access high dimension vector represents, it is right to be asked using described application service
The Processing Algorithm answered and the high dimension vector of the node of described relational network, determine the result of described application service request.The present invention
The intelligent processing method of the big data described in embodiment, is not limited only to certain specific application service, can be multiple application clothes
Business provides unified processing method effectively.
Corresponding to the embodiment of the method described in Fig. 1, present invention also offers a kind of intelligent processing system of big data, such as scheme
Shown in 4, including data structured module 401, representative learning module 402 and application algoritic module 403.
Described data structured module 401, for pre-processing to original big data, and to described pretreated
Original big data carries out networking, obtains comprising the relational network on node and side.Wherein, the node in described relational network, by
Data cell in described pretreated original big data is transformed, the side in described relational network, described for representing
Relation between nodes and node.By networking is carried out to original big data, can by the big data in form or
Mass data is converted into relational network, such that it is able to these data be uniformly processed by the way of node and side, is greatly reduced
Data storage and the cost of management.Secondly, for text datas such as the vocabulary in pretreated original big data and phrases, will
It carries out networking, constructs semantic network, remains the semantic information in text, subsequently can be improved with effectively utilizes
The accuracy of application service.Additionally, after being expressed as comprising node and the relational network on side by pretreated original big data,
The representative learning algorithm of network data just can be utilized, to realize quick, the unified feature extraction to data, thus accomplishing fast
Speed response different application service request.
Described representative learning module 402, for described relational network is adopted with the representative learning algorithm based on embedded mapping,
Obtain the high dimension vector of the node of described relational network.Described representative learning module 402 is by adopting base to described relational network
In the representative learning algorithm of embedded mapping, by the node in described relational network, such as user, product and phrase etc., unified use
Being indicated, wherein, each vector can represent one of described relational network node to the higher vector of dimension, this vector
One of dimension illustrate a feature of this node.Relation between described relational network interior joint and node is (in other words
Side), it is converted into the similarity between the high dimension vector of node and the high dimension vector of node, thus remaining in original big data
Structural information, greatly promotes the accuracy of the tasks such as the corresponding classification of later stage application service and prediction.
Application algoritic module 403, the application service for obtaining user is asked;Determine that described application service request is corresponding
Processing Algorithm, and ask, using described application service, the institute that corresponding Processing Algorithm and described representative learning module 402 obtain
State the high dimension vector of the node of relational network, determine the result of described application service request.That is, in described representative learning module
After the unity of form of the feature high dimension vector in big data is represented by 402, application algoritic module 403 can be using these systems
One feature being represented with high dimension vector, going to provide the solution of various application services to return application service in other words needs to solve
Problem result.
In embodiments of the present invention, described data structured module 401 be used for original big data pre-processed and
Networking, thus described representative learning module 402 can utilize the representative learning algorithm of network data, to realize fast to data
Fast, unified feature extraction, described application algoritic module 403 can be asked according to the application service of user, determines corresponding place
Adjustment method, and calculated using the feature representing in the form of vectors that described representative learning module 402 is extracted, processed
Result returns to user.Different from prior art, in the embodiment of the present invention, the process of whole feature extraction is without the participation of people, profit
It is automatically performed with the representative learning algorithm based on embedded mapping, computational efficiency is high;Also greatly retain during feature extraction
Structural information (i.e. effective information) in original big data, thus improve the accuracy task such as being classified or being predicted;
Moreover, due to employing the representative learning algorithm based on embedded mapping so that the data excavated from original big data
Feature system can be indicated in the form of unification is by high dimension vector, thus the system in the embodiment of the present invention is not limited only to as certain
Specific application service, can provide unified processing method effectively for multiple application services.
It should be noted that the intelligent processing method of the big data described in the embodiment of the present invention, can be applied not only to
Insurance field, can also be applied to other field, for example, be applied to financial field, purchase and consumption field etc., be particularly suited for
To the situation comprising structured data and non-structural data is processed, and need to process the occasion of the data of Multidimensional Relation, relatively
Prior art will have obvious advantage.
Alternatively, because described relational network both may expand semantic network it is also possible to include net with attributes and behavior net by bag
Network.They both may belong to isomorphic relations network, be likely to belong to two-dimentional relation network it is also possible to belong to Multidimensional Relation network.
Therefore, described representative learning module 402 can be specifically for carrying out embedded reflecting to the Multidimensional Relation network in described relational network
Penetrate, obtain the high dimension vector of the node of described Multidimensional Relation network;Or, specifically for the two dimension pass in described relational network
It is that network carries out embedded mapping, obtain the high dimension vector of the node of described two-dimentional relation network;Or, specifically for described pass
It is that semantic network in network carries out embedded mapping, obtain the high dimension vector of the node of described semantic network;Or specifically for
Homogeneous network in described relational network is entered with every trade and embeds mapping, obtain the high dimension vector of the node of described homogeneous network.
Alternatively, in embodiments of the present invention, described original big data can be by each website or APP collection
Come, potentially include the structured datas such as behavioral data, attribute data it is also possible to includings text data etc. unstructured data,
Inventive embodiments here does not limit.
Described data structured module 401 carries out pretreatment to original big data, and described original big data is entered
Row data analysis and cleaning, carry out statistical analysis to original big data, remove the data content not conforming to rule or mistake, Ke Yishi
Invalid data form is filtered, such as removal ought to be floating number, but be filled to the numerical value such as the price of character string type, also may be used
Be are carried out time or unit unified or to disappearance fingering row fill in, smooth noise data etc., such that it is able to
By the standardized format of big data, remove abnormal data, correct mistake or remove repeated data etc..
Alternatively, described pretreated original big data can include behavioral data, attribute data and text data, then
Described data processing module carries out networking to described pretreated original big data and can include:To described pretreated
Behavioral data in original big data carries out networking, for example, be converted into behavior network by the behavioral data such as buying, evaluating;Or
Person, can also include carrying out networking to the attribute data in described pretreated original big data, such as by age, price
It is converted into net with attributes Deng attribute information;Or can also include to the text in described pretreated original big data
Data carries out networking, for example, the text datas such as product introduction or evaluation content are converted into the language with word and phrase as node
Adopted network.Then described behavior network, described net with attributes and described semantic network have collectively constituted described relational network.
Alternatively, described application algoritic module 403 utilizes the high dimension vector of the node of described relational network, determines described answering
During with the result of service request, it is possible to use the high dimension vector of all nodes of described relational network goes to determine described application service
The result of request;The high dimension vector of the part of described relational network can also only be utilized, go to determine described application service request
Result.Specifically, can only go to determine the knot of described application service request using the node related to described application service request
Really.For example, when application service request is Products Show, the high dimension vector of product node and the height of user node can only be utilized
Dimensional vector is calculated.
It should be noted that in embodiments of the present invention, implementing of modules, may be referred to embodiment of the method
Description, such as, with regard to specifically how carrying out the representative learning algorithm based on embedded mapping, may be referred to the description of embodiment of the method,
The embodiment of the present invention will not be described here.
System described in the embodiment of the present invention, can be in the form of software or program, in one or multiple stage computers
Or server is realized, embodiment of the present invention here does not limit.
In order to be better understood from the embodiment of the present invention, by the intelligent processing system of the big data described in the embodiment of the present invention
It is described in detail as a example being applied to insurance.
User carries out improving personal information, checks in personal computer (personal computer, PC) or mobile terminal
When insuring detailed rules and regulations, purchase danger, moving back danger or set up the operation such as social networks, aforesaid operations information, shape can be collected by server
Become original big data, described original big data can be stored in database in table form.Described in the embodiment of the present invention
System can obtain above-mentioned original big data.
For example, by collecting operation information, userspersonal information's table as shown in table 1, such as may in database, be saved
Product information table shown in table 2, the dangerous behavior table of purchase as shown in table 3 and as shown in table 4 move back dangerous behavior table.
Table 1 userspersonal information's table
Table 2 product information table
Dangerous name | Classification | Price | Shou Xian company | Product introduction | …… |
Dangerous A | Vehicle insurance | …… | …… | Premium is low, Claims Resolution is convenient | …… |
Dangerous B | Life insurance | …… | …… | Whole-life insurance, age at issue scope are wide | …… |
Dangerous C | Health insurance | …… | …… | It is high that major disease compensates the amount of money | …… |
…… | …… | …… | …… | …… | …… |
Dangerous behavior table purchased by table 3
ID | Dangerous name | Purchase strategical vantage point point (GPS) | Buy the amount of money | User evaluates |
User 1 | Dangerous A | XX company | …… | It is convenient to buy:) |
User 2 | Dangerous C | XX enterprise | …… | A danger always should be bought outside |
User 3 | Dangerous B | 1.765 | …… | …… |
User 4 | Dangerous A | XX road | …… | To Ai Chejia danger! |
User 5 | Dangerous B | XX cell | …… | …… |
…… | …… | …… | …… | …… |
Table 4 moves back dangerous behavior table
ID | Dangerous name | Move back strategical vantage point point (GPS) | Move back the dangerous amount of money | Move back dangerous reason |
User 3 | Dangerous B | XX street (in family) | …… | …… |
…… | …… | …… | …… | …… |
First, the data structured module in described system can carry out data analysis and cleaning to above-mentioned data.With right
Shown in table 3 structure danger behavior table in data carry out data carry out data analysis and cleaning as a example.Data analysis refers to by data
Statistics and associate acquisition more information, described data structured module can will " job site ", " in family ", " market put attached
The information such as closely " replenishes on geographical location information.Data cleansing refers to remove illegal numerical value or even remember illegal data
Record removes, such as, when " purchase strategical vantage point point " is for real number, described data structured module can hide this numerical value;When record in table 3
" ID " or " dangerous name " numerical value illegal when, described data structured module can remove this purchase and nearly record.Table 5 is table 3
Middle data carries out the result after data analysis and cleaning through described data structured module.
Purchase danger behavior table after data analysis and cleaning for the table 5
ID | Dangerous name | Purchase strategical vantage point point (GPS) | Buy the amount of money | User evaluates |
User 1 | Dangerous A | XX company【Job site】 | …… | It is convenient to buy:) |
User 2 | Dangerous C | XX enterprise【Job site】 | …… | A danger always should be bought outside |
User 3 | Dangerous B | 【Disappearance】 | …… | …… |
User 4 | Dangerous A | XX road【Near certain marketing point】 | …… | To Ai Chejia danger! |
User 5 | Dangerous B | XX cell【In family】 | …… | …… |
…… | …… | …… | …… | …… |
Next, networking can be carried out to original big data after data analysis and cleaning, obtain comprising node and
The relational network on side.By above table, in original big data, there is substantial amounts of text envelope, therefore described data structured
Module can carry out networking to text data, obtains the node being made up of phrase or word, and the side between node, that is,
Obtain comprising the semantic network on node and side.Subsequent characterizations study module can utilize representative learning method to this semantic network,
Learn semantic information therein.For example, can be by the literary composition after data analysis and cleaning in table 1 to table 4 using participle instrument
Notebook data extracts, and obtains the text data of " document-phrase " form as shown in table 6, in table 6, each phrase can be expressed as
One of semantic network node.Between the node of phrase composition, if jointly occurring in sentence or document, between them
There may be side to be attached, the frequency that the weight on side is occurred in jointly by them in sentence or document determines.As " tourism " section
Point has side to be connected between " going out far short of what is expected " node, has side to be connected between " going out far short of what is expected " node and " overworked " node.
Table 6
Conventional operational extremely busy go out far short of what is expected |
Go out far short of what is expected often overworked |
In evil case divorced has a son |
Hobby is traveled out far short of what is expected |
In evil case |
The low Claims Resolution of premium is convenient |
Whole-life insurance age at issue scope is wide |
It is high that major disease compensates the amount of money |
It is convenient to buy |
A danger always should be bought outside |
To Ai Chejia danger |
Price is too high improper |
Furthermore, it is possible to the content in table be carried out networking be converted into relational network:As the content transformation of table 1 is " to use
Multiple two-dimentional relations such as family ID- sex ", " ID-age bracket ", " ID-occupation " and " ID-self-introduction phrase "
Network;The content transformation of table 2 is " dangerous name-classification ", " dangerous name-price range ", " dangerous name-Shou Xian company " and " dangerous name-product
Introduce phrase " etc. multiple two-dimentional relation networks;The content transformation of table 3 is " ID-danger name-purchase strategical vantage point point-amount of money interval-
Evaluate phrase " Multidimensional Relation network, the content transformation of table 4 is " ID-danger name-move back strategical vantage point point-amount of money interval-move back danger
The Multidimensional Relation network of reason phrase ".
In the relational network eventually forming, both contained above-mentioned semantic network, and also contains and turned by the content of table 1- table 4
The multiple Multidimensional Relation networks changed and come and two-dimentional relation network, have genus in these Multidimensional Relation networks and two-dimentional relation network
Property network has behavior network again;With ID, user property, product attribute, place, phrase etc. as node in relational network, with
Interaction/relation between them is as the side of described relational network.
It should be noted that permission node overlapping in relational network, above-mentioned two-dimentional relation network and Multidimensional Relation network can
To be fused into the relational network containing plurality of classes node with " ID ", " dangerous name ", " phrase " etc., i.e. multi-source heterogeneous network.
After original big data is converted into relational network by described data structured module, representative learning module can be to described relation
Data in network carries out representative learning.It is assumed that the number of dimensions of high dimension vector is K (the usual value of K is between 200 to 500),
The result of representative learning is (as phrase node, user node, user property node, product node by the node in relational network
Deng) it is expressed as multiple high dimension vectors, remain the incidence relation (i.e. side) in this relational network in high dimension vector.
By analysis above, in embodiments of the present invention, in described relational network, include semantic network, two dimension
Relational network and Multidimensional Relation network.Then described representative learning module can be to described semantic network using based on embedded mapping
Representative learning algorithm, can be specifically:By " tourism ", " going out far short of what is expected ", " overworked ", " in evil case ", " great disease
Node in the semantic networks such as disease " is expressed as high dimension vector, such as u=[u1,u2,…,uK], and pass through representative learning algorithm, can
Similar to the vector of " going out far short of what is expected " node with the vector of excavating " tourism " node, the vector of " going out far short of what is expected " node and " mistake
Degree is tired " vector of the vector of node is similar, the vector of " overworked " node, the vector of " in evil case " node with " great
The vector of disease " node is similar.Thus remaining the structural information of data in network.
Described representative learning module can be described to two-dimentional relation network using the representative learning algorithm based on embedded mapping
The representative learning result of two-dimentional relation network can be:By " ID ", " user property ", " product IDs (dangerous name) ", " product genus
Property " etc. node be expressed as high dimension vector, the product of attribute similarity high by the vector similarity that makes the user node of attribute similarity
The vector similarity of moral integrity point is high, remains the structural information in described relational network, the final user's section making trip more
Between point, between classification identical product node, there is similar vector.
Described representative learning module can be described to Multidimensional Relation network using the representative learning algorithm based on embedded mapping
The representative learning result of Multidimensional Relation network can be:The node such as " user ", " dangerous name ", " place ", " evaluation phrase " is represented
For high dimension vector so that have similar purchase, move back danger custom user node vector similarity high, buy, quit the subscription of user's phase
As product node vector similarity high, remain the structural information in relational network.
In embodiments of the present invention, described representative learning module can be based on embedded mapping (Embedding), combination
Skip-gram and Negative Sampling realizes it is ensured that the computation complexity of algorithm is low, and algorithm extensibility is strong.
In embodiments of the present invention, each node unification high dimension vector of relational network is indicated, and retains
Structural information in relational network, is directed to different mission requirements in the follow-up application service, and application algoritic module is permissible
The high dimension vector calling wherein part of nodes is calculated, and computation complexity is low.
For example, it is assumed that the application service request of user needs the problem solving to recommend for insurance products, we can use should
Realize insurance products with algoritic module to recommend.Insurance products are recommended to be given user, find with this user in buying behavior
Similar, quit the subscription of the most different products in behavior.With this application service corresponding Processing Algorithm of request it is then:Using cosine similarity
Vector similarity computational methods such as (Cosine similarity), select the vector of the vector sum product node of user node, meter
Calculate the similarity of the vector of vector sum product node of user node.Such as, in representative learning, we can be in user node
The vector of vector sum product node in preserve " purchase dangerous behavior " information by the 1st to 100 dimensional vector, if i.e. user A purchase
Buy product A, then the 1st to 100 dimension of the vector of user A node and the 1st to 100 dimension of the vector of product A node are similar;We
" moving back dangerous behavior " can also be preserved by the 101st to 200 dimensional vector in the vector of the vector sum product node of user node
Information, if that is, user A has quit the subscription of product B, the 101st to 200 dimension of the vector of user's A node and the vector of product B node
The 101st to 200 dimension similar.Therefore, if it is desirable to give user's A recommended products, then it is the finding with the vector of user's A node
1 to 100 dimensional vector is similar, the vector of the 101st to 200 dimensional vector dissimilar product node.
In the same manner, described application algoritic module can also realize user class using the high dimension vector obtaining through representative learning
Detection of other classification and fraud insurance fraud user etc., the embodiment of the present invention will not be described here.
Corresponding to the intelligent processing method of the big data described in Fig. 3, embodiments provide a kind of big data
Intelligent processing system, as shown in figure 5, this system can include:
Acquisition module 501, the relation that the application service for obtaining user is asked and is transformed by original big data
The high dimension vector of the node of network.
Determining module 502, for determining the described application service corresponding Processing Algorithm of request, please using described application service
Seek the high dimension vector of the node of corresponding Processing Algorithm and described relational network, determine the result of described application service request.
Alternatively, described by the relational network that original big data is transformed it is:By described original big data through pre-
Carry out the relational network obtained by networking after reason.
In embodiments of the present invention, the feature that described acquisition module 501 can be represented with direct access high dimension vector, thus
Described determining module utilizes described application service to ask the high dimension vector of the node of corresponding Processing Algorithm and described relational network,
Determine the result of described application service request.The intelligent processing system of the big data described in the embodiment of the present invention, is not limited only to certain
Individual specific application service, can provide unified processing method effectively for multiple application services.
Device embodiment described above is only that schematically the wherein said unit illustrating as separating component can
To be or to may not be physically separate, as the part that unit shows can be or may not be physics list
Unit, you can with positioned at a place, or can also be distributed on multiple NEs.Can be selected it according to the actual needs
In the purpose to realize this embodiment scheme for some or all of module.Those of ordinary skill in the art are not paying creativeness
Work in the case of, you can to understand and to implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
Mode by software plus necessary general hardware platform to be realized naturally it is also possible to pass through hardware.Based on such understanding, on
That states that technical scheme substantially contributes to prior art in other words partly can be embodied in the form of software product, should
Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD etc., including some fingers
Order is with so that a computer equipment (can be personal computer, server, or network equipment etc.) executes each enforcement
Example or some partly described methods of embodiment.
Finally it should be noted that:Above example only in order to technical scheme to be described, is not intended to limit;Although
With reference to the foregoing embodiments the present invention is described in detail, it will be understood by those within the art that:It still may be used
To modify to the technical scheme described in foregoing embodiments, or equivalent is carried out to wherein some technical characteristics;
And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and
Scope.
Claims (20)
1. a kind of intelligent processing system of big data is it is characterised in that include:
Data structured module, for pre-processing to original big data, and to described pretreated original big data
Carry out networking, obtain comprising the relational network on node and side;
Representative learning module:Using the representative learning algorithm based on embedded mapping, obtain described pass for described relational network
It is the high dimension vector of the node of network;
Application algoritic module:Application service for obtaining user is asked;Determine described application service request corresponding place adjustment
Method, and ask, using described application service, the described relational network that corresponding Processing Algorithm and described representative learning module obtain
Node high dimension vector, determine the result of described application service request.
2. system according to claim 1 is it is characterised in that comprise Multidimensional Relation network in described relational network, then institute
State representative learning module specifically for described Multidimensional Relation network is carried out with embedded mapping, obtain the section of described Multidimensional Relation network
The high dimension vector of point.
3. system according to claim 1 is it is characterised in that comprise semantic network in described relational network, then described table
Levy study module specifically for described semantic network is carried out with embedded mapping, obtain described semantic network node higher-dimension to
Amount.
4. the system according to Claims 2 or 3 is it is characterised in that comprise two-dimentional relation network, then in described relational network
Described representative learning module, specifically for described two-dimentional relation network is carried out with embedded mapping, obtains described two-dimentional relation network
The high dimension vector of node.
5. system according to claim 1 is it is characterised in that described original big data includes behavioral data, attribute data
And text data.
6. according to claim 1 or 5 system it is characterised in that described data structured module is specifically for described
Behavioral data in pretreated original big data carries out networking, obtains comprising the behavior network on node and side;
Networking is carried out to the attribute data in described pretreated original big data, obtains comprising the attribute net on node and side
Network;And,
Networking is carried out to the text data in described pretreated original big data, obtains comprising the semantic net on node and side
Network;
Wherein, described behavior network, described net with attributes and described semantic network have collectively constituted described relational network.
7. system according to claim 1 is it is characterised in that described data structured module is specifically for described original
Big data carries out data analysis and cleaning.
8. system according to claim 1 is it is characterised in that described application algoritic module is specifically for using described relation
The high dimension vector of the part of nodes in network, and the described application service corresponding Processing Algorithm of request, determine described application clothes
The result of business request.
9. a kind of intelligent processing system of big data is it is characterised in that include:
Acquisition module, the section of relational network that the application service for obtaining user is asked and is transformed by original big data
The high dimension vector of point;
Determining module, for determining the described application service corresponding Processing Algorithm of request, asks to correspond to using described application service
Processing Algorithm and described relational network node high dimension vector, determine the result of described application service request.
10. system according to claim 9 is it is characterised in that the described relational network being transformed by original big data
For:Relational network obtained by networking is carried out after pretreatment by described original big data.
A kind of 11. intelligent processing methods of big data are it is characterised in that include:
Original big data is pre-processed;
Networking is carried out to described pretreated original big data, obtains comprising the relational network on node and side;
To described relational network using the representative learning algorithm based on embedded mapping, obtain the higher-dimension of the node of described relational network
Vector;
Obtain the application service request of user;
Determine that corresponding Processing Algorithm is asked in described application service;
Ask the high dimension vector of the node of corresponding Processing Algorithm and described relational network using described application service, determine described
The result of application service request.
12. methods according to claim 11 are it is characterised in that comprise Multidimensional Relation network, then in described relational network
The described representative learning algorithm described relational network being adopted based on embedded mapping, obtains the higher-dimension of the node of described relational network
Vector, including:
Described Multidimensional Relation network is carried out with embedded mapping, obtains the high dimension vector of the node of described Multidimensional Relation network.
13. methods according to claim 11 are it is characterised in that comprising semantic network in described relational network, then described
To described relational network using representative learning algorithm based on embedded mapping, obtain described relational network node higher-dimension to
Amount, including:
Described semantic network is carried out with embedded mapping, obtains the high dimension vector of the node of described semantic network.
14. methods according to claim 12 or 13 are it is characterised in that comprise two-dimentional relation net in described relational network
Network, then the described representative learning algorithm described relational network being adopted based on embedded mapping, obtains the node of described relational network
High dimension vector, including:
Described two-dimentional relation network is carried out with embedded mapping, obtains the high dimension vector of the node of described two-dimentional relation network.
15. methods according to claim 11 are it is characterised in that described original big data includes behavioral data, attribute number
According to and text data.
16. methods according to claim 11 or 15 it is characterised in that described to described pretreated original big data
Carry out networking, obtain comprising the relational network on node and side, including:
Networking is carried out to the behavioral data in described pretreated original big data, obtains comprising the behavior net on node and side
Network;
Networking is carried out to the attribute data in described pretreated original big data, obtains comprising the attribute net on node and side
Network;And,
Networking is carried out to the text data in described pretreated original big data, obtains comprising the semantic net on node and side
Network;
Described behavior network, described net with attributes and described semantic network have collectively constituted described relational network.
17. methods according to claim 11 are it is characterised in that described carry out pretreatment to original big data and include to institute
State original big data and carry out data analysis and cleaning.
18. methods according to claim 11 are it is characterised in that described ask corresponding process using described application service
The high dimension vector of the node of algorithm and described relational network, determines the result of described application service request, including:
Using the high dimension vector of the part of nodes in described relational network, and the adjustment of described application service request corresponding place
Method, determines the result of described application service request.
A kind of 19. intelligent processing methods of big data are it is characterised in that include:
Obtain the application service request of user and the high dimension vector of the node of relational network being transformed by original big data;
Determine that corresponding Processing Algorithm is asked in described application service;
Ask the high dimension vector of the node of corresponding Processing Algorithm and described relational network using described application service, determine described
The result of application service request.
20. methods according to claim 19 are it is characterised in that the described relational network being transformed by original big data
For:Relational network obtained by networking is carried out after pretreatment by described original big data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610382955.7A CN106445988A (en) | 2016-06-01 | 2016-06-01 | Intelligent big data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610382955.7A CN106445988A (en) | 2016-06-01 | 2016-06-01 | Intelligent big data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106445988A true CN106445988A (en) | 2017-02-22 |
Family
ID=58183425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610382955.7A Pending CN106445988A (en) | 2016-06-01 | 2016-06-01 | Intelligent big data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106445988A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818176A (en) * | 2017-11-21 | 2018-03-20 | 清华大学 | The distributed network excavated towards Large Scale Graphs represents learning method |
CN107909274A (en) * | 2017-11-17 | 2018-04-13 | 平安科技(深圳)有限公司 | Enterprise investment methods of risk assessment, device and storage medium |
CN108304549A (en) * | 2018-02-01 | 2018-07-20 | 广东聚晨知识产权代理有限公司 | A kind of big data Intelligent processing system |
CN108470312A (en) * | 2018-02-07 | 2018-08-31 | 中国平安人寿保险股份有限公司 | Analysis method, device, storage medium and the terminal for case of settling a claim |
CN108573306A (en) * | 2017-03-10 | 2018-09-25 | 北京搜狗科技发展有限公司 | Export method, the training method and device of deep learning model of return information |
CN109063041A (en) * | 2018-07-17 | 2018-12-21 | 阿里巴巴集团控股有限公司 | The method and device of relational network figure insertion |
CN109241223A (en) * | 2018-08-23 | 2019-01-18 | 中国电子科技集团公司电子科学研究院 | The recognition methods of behavior whereabouts and platform |
CN109685647A (en) * | 2018-12-27 | 2019-04-26 | 阳光财产保险股份有限公司 | The training method of credit fraud detection method and its model, device and server |
CN109885700A (en) * | 2019-02-26 | 2019-06-14 | 扬州制汇互联信息技术有限公司 | A kind of unstructured data analysis method based on industrial knowledge mapping |
CN110032665A (en) * | 2019-03-25 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Determine the method and device of node of graph vector in relational network figure |
CN110084137A (en) * | 2019-04-04 | 2019-08-02 | 百度在线网络技术(北京)有限公司 | Data processing method, device and computer equipment based on Driving Scene |
CN110688433A (en) * | 2019-12-10 | 2020-01-14 | 银联数据服务有限公司 | Path-based feature generation method and device |
WO2020056984A1 (en) * | 2018-09-19 | 2020-03-26 | 平安科技(深圳)有限公司 | Shortest path query method, system, computer device and storage medium |
CN111104397A (en) * | 2019-11-19 | 2020-05-05 | 浙江工业大学 | Flume-based configurable data integration method |
CN111125272A (en) * | 2018-10-31 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Regional feature acquisition method and device, computer equipment and medium |
CN111192072A (en) * | 2019-10-29 | 2020-05-22 | 腾讯科技(深圳)有限公司 | User grouping method and device and storage medium |
WO2020119105A1 (en) * | 2018-12-13 | 2020-06-18 | 平安医疗健康管理股份有限公司 | Payment excess identification method and apparatus based on big data, and storage medium and device |
CN111667181A (en) * | 2020-06-08 | 2020-09-15 | 拉扎斯网络科技(上海)有限公司 | Task processing method and device, electronic equipment and computer readable storage medium |
CN113077353A (en) * | 2021-04-22 | 2021-07-06 | 北京十一贝科技有限公司 | Method, apparatus, electronic device, and medium for generating underwriting conclusion |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101782976A (en) * | 2010-01-15 | 2010-07-21 | 南京邮电大学 | Automatic selection method for machine learning in cloud computing environment |
CN102722736A (en) * | 2012-06-13 | 2012-10-10 | 合肥工业大学 | Method for splitting and identifying character strings at complex interference |
CN103324708A (en) * | 2013-06-18 | 2013-09-25 | 哈尔滨工程大学 | Method of transfer learning from long text to short text |
-
2016
- 2016-06-01 CN CN201610382955.7A patent/CN106445988A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101782976A (en) * | 2010-01-15 | 2010-07-21 | 南京邮电大学 | Automatic selection method for machine learning in cloud computing environment |
CN102722736A (en) * | 2012-06-13 | 2012-10-10 | 合肥工业大学 | Method for splitting and identifying character strings at complex interference |
CN103324708A (en) * | 2013-06-18 | 2013-09-25 | 哈尔滨工程大学 | Method of transfer learning from long text to short text |
Non-Patent Citations (2)
Title |
---|
CHENG YANG 等: "Network Representation Learning with Rich Text Information", 《PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI 2015)》 * |
陈维政: "网络表示学习", 《大数据》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108573306A (en) * | 2017-03-10 | 2018-09-25 | 北京搜狗科技发展有限公司 | Export method, the training method and device of deep learning model of return information |
CN108573306B (en) * | 2017-03-10 | 2021-11-02 | 北京搜狗科技发展有限公司 | Method for outputting reply information, and training method and device for deep learning model |
CN107909274A (en) * | 2017-11-17 | 2018-04-13 | 平安科技(深圳)有限公司 | Enterprise investment methods of risk assessment, device and storage medium |
CN107818176B (en) * | 2017-11-21 | 2018-12-07 | 清华大学 | Learning method is indicated towards the distributed network that Large Scale Graphs excavate |
CN107818176A (en) * | 2017-11-21 | 2018-03-20 | 清华大学 | The distributed network excavated towards Large Scale Graphs represents learning method |
CN108304549A (en) * | 2018-02-01 | 2018-07-20 | 广东聚晨知识产权代理有限公司 | A kind of big data Intelligent processing system |
CN108470312A (en) * | 2018-02-07 | 2018-08-31 | 中国平安人寿保险股份有限公司 | Analysis method, device, storage medium and the terminal for case of settling a claim |
CN108470312B (en) * | 2018-02-07 | 2021-07-06 | 中国平安人寿保险股份有限公司 | Method and device for analyzing claim case, storage medium and terminal |
CN109063041B (en) * | 2018-07-17 | 2020-04-07 | 阿里巴巴集团控股有限公司 | Method and device for embedding relational network graph |
CN109063041A (en) * | 2018-07-17 | 2018-12-21 | 阿里巴巴集团控股有限公司 | The method and device of relational network figure insertion |
CN109241223A (en) * | 2018-08-23 | 2019-01-18 | 中国电子科技集团公司电子科学研究院 | The recognition methods of behavior whereabouts and platform |
CN109241223B (en) * | 2018-08-23 | 2022-06-28 | 中国电子科技集团公司电子科学研究院 | Behavior track identification method and system |
WO2020056984A1 (en) * | 2018-09-19 | 2020-03-26 | 平安科技(深圳)有限公司 | Shortest path query method, system, computer device and storage medium |
CN111125272A (en) * | 2018-10-31 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Regional feature acquisition method and device, computer equipment and medium |
WO2020119105A1 (en) * | 2018-12-13 | 2020-06-18 | 平安医疗健康管理股份有限公司 | Payment excess identification method and apparatus based on big data, and storage medium and device |
CN109685647A (en) * | 2018-12-27 | 2019-04-26 | 阳光财产保险股份有限公司 | The training method of credit fraud detection method and its model, device and server |
CN109685647B (en) * | 2018-12-27 | 2021-08-10 | 阳光财产保险股份有限公司 | Credit fraud detection method and training method and device of model thereof, and server |
CN109885700A (en) * | 2019-02-26 | 2019-06-14 | 扬州制汇互联信息技术有限公司 | A kind of unstructured data analysis method based on industrial knowledge mapping |
CN110032665B (en) * | 2019-03-25 | 2023-11-17 | 创新先进技术有限公司 | Method and device for determining graph node vector in relational network graph |
CN110032665A (en) * | 2019-03-25 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Determine the method and device of node of graph vector in relational network figure |
CN110084137A (en) * | 2019-04-04 | 2019-08-02 | 百度在线网络技术(北京)有限公司 | Data processing method, device and computer equipment based on Driving Scene |
CN111192072B (en) * | 2019-10-29 | 2023-08-04 | 腾讯科技(深圳)有限公司 | User grouping method and device and storage medium |
CN111192072A (en) * | 2019-10-29 | 2020-05-22 | 腾讯科技(深圳)有限公司 | User grouping method and device and storage medium |
CN111104397B (en) * | 2019-11-19 | 2021-10-15 | 浙江工业大学 | Flume-based configurable data integration method |
CN111104397A (en) * | 2019-11-19 | 2020-05-05 | 浙江工业大学 | Flume-based configurable data integration method |
CN110688433A (en) * | 2019-12-10 | 2020-01-14 | 银联数据服务有限公司 | Path-based feature generation method and device |
CN111667181A (en) * | 2020-06-08 | 2020-09-15 | 拉扎斯网络科技(上海)有限公司 | Task processing method and device, electronic equipment and computer readable storage medium |
CN111667181B (en) * | 2020-06-08 | 2023-04-28 | 拉扎斯网络科技(上海)有限公司 | Task processing method, device, electronic equipment and computer readable storage medium |
CN113077353A (en) * | 2021-04-22 | 2021-07-06 | 北京十一贝科技有限公司 | Method, apparatus, electronic device, and medium for generating underwriting conclusion |
CN113077353B (en) * | 2021-04-22 | 2024-02-02 | 北京十一贝科技有限公司 | Method, device, electronic equipment and medium for generating nuclear insurance conclusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106447066A (en) | Big data feature extraction method and device | |
CN106445988A (en) | Intelligent big data processing method and system | |
Swathi et al. | An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis | |
Zheng et al. | Feature engineering for machine learning: principles and techniques for data scientists | |
CN109446341A (en) | The construction method and device of knowledge mapping | |
CN110968701A (en) | Relationship map establishing method, device and equipment for graph neural network | |
Akerkar et al. | Intelligent techniques for data science | |
CN107357793A (en) | Information recommendation method and device | |
Verdhan | Supervised learning with python | |
CN107357874A (en) | User classification method and device, electronic equipment, storage medium | |
CN112487199A (en) | User characteristic prediction method based on user purchasing behavior | |
Kaluža | Machine Learning in Java | |
Madhavan | Mastering python for data science | |
CN109740642A (en) | Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing | |
Rustam et al. | Review prognosis system to predict employees job satisfaction using deep neural network | |
Zheng et al. | Deep learning in economics: a systematic and critical review | |
Ciaburro et al. | Python Machine Learning Cookbook: Over 100 recipes to progress from smart data analytics to deep learning using real-world datasets | |
KR102506778B1 (en) | Method and apparatus for analyzing risk of contract | |
Karatzoglou et al. | Applying depthwise separable and multi-channel convolutional neural networks of varied kernel size on semantic trajectories | |
Chopra et al. | Data Science with Python: Combine Python with machine learning principles to discover hidden patterns in raw data | |
Li et al. | An improved genetic-XGBoost classifier for customer consumption behavior prediction | |
Jeyaraman et al. | Practical Machine Learning with R: Define, build, and evaluate machine learning models for real-world applications | |
Lv et al. | A two-route CNN model for bank account classification with heterogeneous data | |
CN115344794A (en) | Scenic spot recommendation method based on knowledge map semantic embedding | |
CN114911940A (en) | Text emotion recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20171113 Address after: 100000 Beijing Zhongguancun Daxing District science and Technology Park Daxing biomedical industry base Tianhe West Road, 28, 4, 3, 307 rooms Applicant after: Silver Li'an financial information services (Beijing) Co., Ltd. Address before: 201203 Shanghai City, Pudong New Area Chinese (Shanghai) free trade zone fanchun Road No. 400 Building 1 room 301-254 Applicant before: COEUSYS INC. |
|
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |