CN106372057A - Content auditing method and apparatus - Google Patents
Content auditing method and apparatus Download PDFInfo
- Publication number
- CN106372057A CN106372057A CN201610727794.0A CN201610727794A CN106372057A CN 106372057 A CN106372057 A CN 106372057A CN 201610727794 A CN201610727794 A CN 201610727794A CN 106372057 A CN106372057 A CN 106372057A
- Authority
- CN
- China
- Prior art keywords
- content
- verification
- examination
- described content
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000008569 process Effects 0.000 claims abstract description 34
- 238000012795 verification Methods 0.000 claims description 87
- 238000012549 training Methods 0.000 claims description 21
- 230000035945 sensitivity Effects 0.000 claims description 7
- 238000013145 classification model Methods 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 4
- 238000012552 review Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000012550 audit Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000009849 deactivation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a content auditing method and apparatus. According to the method and the apparatus provided by embodiments of the invention, to-be-submitted contents of a user are obtained; whether the contents contain preset sensitive vocabularies or not is detected; and if the contents do not contain the preset sensitive vocabularies, the contents are processed by utilizing a classification model to obtain junk degree parameters of the contents, so that whether the contents pass the auditing or not can be determined according to the junk degree parameters. The auditing process is free of artificial participation, the operation is simple, and the accuracy is high, so that the efficiency and reliability of content auditing are improved.
Description
Technical field
The present invention relates to Internet technology, more particularly, to a kind of checking method of content and device.
Background technology
With the development of communication technology, terminal is integrated with increasing function, so that the systemic-function row of terminal
More and more corresponding applications (application, app) are contained in table.User all can be needed in major applications to submit to individual
The user data of people, for example, the pet name of user, the contact method of user, user content to be released etc..
Generally, these data, after submitting to, need, by carrying out manual examination and verification process, only to pass through the data of examination & verification,
The user data successfully as individual subscriber can be submitted to.This mode, complex operation, and easy error, thus in result in
Hold the efficiency of examination & verification and the reduction of reliability.
Content of the invention
The many aspects of the present invention provide a kind of checking method of content and device, with improve content auditing efficiency and can
By property.
An aspect of of the present present invention, provides a kind of checking method of content, comprising:
Obtain user's content to be committed, detect the sensitive vocabulary whether comprising in described content to pre-set;
If described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, described content is processed, with
Obtain the spam degree parameter of described content;
According to described spam degree parameter, determine described content whether by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described acquisition is treated
After the content of examination & verification, also include:
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out to described content
Whether manual examination and verification are processed, determine described content by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described according to institute
State spam degree parameter, determine described content whether by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described according to institute
State spam degree parameter, determine described content whether by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines described interior
Hold not over examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user is in described blacklist, will be described interior
Hold and carry out reporting process, for carrying out manual examination and verification process to described content, determine described content whether by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, if described
Spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is carried out reporting process,
For manual examination and verification process is carried out to described content, after determining described content whether by examination & verification, also include:
By the manual examination and verification result of described content and described content, as training sample;
Using described training sample, update described disaggregated model.
Another aspect of the present invention, provides a kind of examination & verification device of content, comprising:
Acquiring unit, the content to be committed for obtaining user, detects whether comprise in described content to pre-set quick
Sense vocabulary;
Taxon, if do not comprise the sensitive vocabulary pre-setting, using disaggregated model, to described interior for described content
Hold and processed, to obtain the spam degree parameter of described content;
Whether analytic unit, for according to described spam degree parameter, determining described content by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described grouping sheet
Unit, is additionally operable to
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out to described content
Whether manual examination and verification are processed, determine described content by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described analysis list
Unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described analysis list
Unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines described interior
Hold not over examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user is in described blacklist, will be described interior
Hold and carry out reporting process, for carrying out manual examination and verification process to described content, determine described content whether by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described analysis list
Unit, is additionally operable to
By the manual examination and verification result of described content and described content, as training sample;And
Using described training sample, update described disaggregated model.
From described technical scheme, the embodiment of the present invention is passed through to obtain user's content to be committed, detects described content
In whether comprise the sensitive vocabulary that pre-sets, if described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model,
Described content is processed, to obtain the spam degree parameter of described content, enabling according to described spam degree parameter, determine
Described content, whether by examination & verification, manually need not participate in review process, simple to operate, and accuracy is high, thus improve content
The efficiency of examination & verification and reliability.
In addition, adopting technical scheme provided by the present invention, being capable of significant increase Consumer's Experience.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the specific embodiment of the present invention.
Brief description
By reading the detailed description of hereafter preferred implementation, various other advantages and benefit are common for this area
Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as to the present invention
Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
The schematic flow sheet of the checking method of the content that Fig. 1 provides for one embodiment of the invention;
The structural representation of the examination & verification device of the content that Fig. 2 provides for another embodiment of the present invention.
Specific embodiment
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to be able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
It should be noted that involved subscriber terminal equipment in the embodiment of the present invention can include but is not limited to mobile phone,
Personal digital assistant (personal digital assistant, pda), radio hand-held equipment, panel computer (tablet
Computer), PC (personal computer, pc), Mp 3 player, mp4 player, wearable device be (for example,
Intelligent glasses, intelligent watch, Intelligent bracelet etc.) etc..
In addition, the terms "and/or", a kind of only incidence relation of description affiliated partner, expression there may be
Three kinds of relations, for example, a and/or b, can represent: individualism a, there are a and b, these three situations of individualism b simultaneously.Separately
Outward, character "/" herein, typically represent forward-backward correlation to as if a kind of relation of "or".
The schematic flow sheet of the checking method of the content that Fig. 1 provides for one embodiment of the invention, as shown in Figure 1.
101st, obtain user's content to be committed, detect the sensitive vocabulary whether comprising in described content to pre-set.
If 102 described contents do not comprise the sensitive vocabulary pre-setting, using disaggregated model, at described content
Reason, to obtain the spam degree parameter of described content.
So-called spam degree parameter, for an evaluating of the authentic and valid degree of description content.
It should be noted that before 102, existing training method can be adopted, build disaggregated model.The instruction being adopted
Practice the training sample included in sample set, can be the known sample through mark, as such, it is possible to directly utilize known to these
Sample is trained, to build disaggregated model;Or can also a part be known sample through mark, another part is not
There is the unknown sample through mark, then, then can be trained first with known sample, to build preliminary classification model, so
Afterwards, recycle preliminary classification model that unknown sample is predicted, to obtain classification results, and then then can be according to unknown sample
Classification results, unknown sample is labeled, to form known sample, as the known sample newly increasing, using newly increasing
Known sample, and original known sample re-starts training, to build new disaggregated model, until constructed classification
Till model or known sample meet the cut-off condition of disaggregated model, such as classification accuracy is more than or equal to pre-set accurate
The quantity of rate threshold value or known sample is more than or equal to amount threshold pre-setting etc., and the present embodiment is not especially limited to this
Fixed.
103rd, according to described spam degree parameter, determine described content whether by examination & verification.
It should be noted that 101~103 executive agent can be partly or entirely the application being located locally terminal,
Or can also be plug-in unit or the SDK (software in the application be arranged in local terminal
Development kit, sdk) etc. functional unit, or can also be the process engine in network side server, or
Can also be the distributed system positioned at network side, the present embodiment is not particularly limited to this.
It is understood that described application can be mounted in the local program (nativeapp) in terminal, or also may be used
To be a web page program (webapp) of browser in terminal, the present embodiment is not particularly limited to this.
So, the content to be committed by obtaining user, detects the sensitive word whether comprising in described content to pre-set
Converge, if described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, described content is processed, to obtain
The spam degree parameter of described content, enabling according to described spam degree parameter, determines that described content, need not whether by examination & verification
Artificial participation review process, simple to operate, and accuracy is high, thus improve efficiency and the reliability of content auditing.
In the present invention, by the rubbish contents accounting of the user data such as user basic information and user's content to be issued
Less, can be filtered by key word is that sensitive vocabulary filters and the very high content of degree of safety audited automatically by disaggregated model, so
The inner capacitiess needing manual examination and verification to process can be greatly reduced, substantial amounts of manual examination and verification process time and manpower money can be saved
Source, can effectively improve the efficiency of content auditing.
Alternatively, in a possible implementation of the present embodiment, sensitive word lists can be pre-set,
Several sensitive vocabulary are safeguarded in this sensitive word lists.Obtaining user's content to be committed, detecting in described content is
After the no sensitive vocabulary comprising to pre-set, specifically may determine that whether comprise in described content quick in sensitive word lists
Sense vocabulary.
If described content does not comprise described sensitivity vocabulary, 102 can be continued executing with, using disaggregated model, to described interior
Hold and processed, to obtain the spam degree parameter of described content.
If described content comprises described sensitivity vocabulary, described content can be carried out reporting process, for described interior
Hold and carry out manual examination and verification process, determine described content whether by examination & verification.Specifically, manual examination and verification are processed rule and mark
Standard, can be particularly limited to this using any rule of the prior art and standard, the present embodiment.
After manual examination and verification process is carried out to described content, can also be artificial by described content and described content further
Examination & verification result, as training sample.Then, recycle described training sample, update described disaggregated model.
Alternatively, in a possible implementation of the present embodiment, in 102, the disaggregated model being adopted is permissible
Including but not limited to word-dividing mode, Bayes's classification module and training dictionary module, the present embodiment is not particularly limited to this.
In 102, specifically can execute and operate as follows:
First, in word-dividing mode, specifically word segmentation processing can be carried out to described content, to obtain word segmentation result.Right
After described content carries out word segmentation processing, in order to improve the efficiency of subsequent treatment and reduce noise, to obtain after word segmentation processing
Each word carries out filtration treatment, filtration treatment including but not limited to set forth below: filters out what default deactivation vocabulary was comprised
Word;Wherein, generic word list is to be in advance based on function word, auxiliary word, pronoun, article, adverbial word, modal particle that word frequency statisticses go out etc., this
A little words generally do not possess independent competency.Specifically can be by default high frequency bar be reached to the frequency of occurrences in existing resource
The word of part is collected obtaining, for example, auxiliary word " " there is the very high frequency of occurrences, but it generally has very low energy of expressing the meaning
Power, therefore, is collected in deactivation vocabulary.
Secondly, in Bayes's classification module, specifically can be using the vocabulary being stored in training dictionary module and this word
The category attribute converging, obtains the category attribute of each word segmentation result;And then, then can be belonged to according to the classification of each word segmentation result
Property, obtain the rubbish scoring of each word segmentation result, the rubbish scoring according to each word segmentation result is it becomes possible to obtain described content
Spam degree parameter.Calculate the spam degree parameter of content, multiple methods of the prior art can be adopted, detailed description may refer to
Related content of the prior art, here is omitted.
Alternatively, in a possible implementation of the present embodiment, in addition it is also necessary to pre-set parameter before 103
Threshold value.Specifically, an empirical value can be set, as parameter threshold, or optimized algorithm can also be utilized, draw one
Optimal value, as parameter threshold, the present embodiment is not particularly limited to this.In the spam degree obtaining user's content to be committed
After parameter, specifically may determine that described spam degree parameter whether less than or equal to the parameter threshold pre-setting.
During a concrete implementation, if described spam degree parameter is less than or equal to the parameter threshold pre-setting,
Then can determine that described content passes through examination & verification.
During another concrete implementation, if described spam degree parameter is more than described parameter threshold, determine described interior
Hold not over examination & verification.
During another concrete implementation, if described spam degree parameter is more than described parameter threshold, and described user
In described blacklist, then can determine described content not over examination & verification.So, this content need not be carried out again with any examining
Core, it is possible to directly determine that this content is not passed through to audit, can effectively improve the efficiency of content auditing.
Wherein, the user being stored in blacklist, can be tactful according to specifying, determination.Described specified strategy is permissible
The content submitted to by certain user is processed through manual examination and verification, determines that the number of times that this content does not pass through to audit is more than and specifies threshold
Value, for example, 5 times, etc., the present embodiment is not particularly limited to this.
During another concrete implementation, if described spam degree parameter is more than described parameter threshold, and described user
Not in described blacklist, then described content can be carried out reporting process, for manual examination and verification process is carried out to described content,
Determine described content whether by examination & verification.Specifically, manual examination and verification are processed rule and standard, can be using in prior art
Any rule and standard, the present embodiment is not particularly limited to this.
After manual examination and verification process is carried out to described content, can also be artificial by described content and described content further
Examination & verification result, as training sample.Then, recycle described training sample, update described disaggregated model.
In the present embodiment, by obtaining user's content to be committed, detect whether comprise in described content to pre-set
Sensitive vocabulary, if described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, is processed to described content,
To obtain the spam degree parameter of described content, enabling according to described spam degree parameter, determine described content whether by examining
Core, manually need not participate in review process, simple to operate, and accuracy is high, thus improve efficiency and the reliability of content auditing.
In addition, adopting technical scheme provided by the present invention, being capable of significant increase Consumer's Experience.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement because
According to the present invention, some steps can be carried out using other orders or simultaneously.Secondly, those skilled in the art also should know
Know, embodiment described in this description belongs to preferred embodiment, involved action and the module not necessarily present invention
Necessary.
In the described embodiment, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment
Point, may refer to the associated description of other embodiment.
The structural representation of the examination & verification device of the content that Fig. 2 provides for another embodiment of the present invention, as shown in Figure 2.This reality
The examination & verification device applying the content of example can include acquiring unit 21, taxon 22 and analytic unit 23.Wherein, acquiring unit
21, the content to be committed for obtaining user, detects the sensitive vocabulary whether comprising in described content to pre-set;Taxon
22, if do not comprise the sensitive vocabulary pre-setting for described content, using disaggregated model, described content is processed, with
Obtain the spam degree parameter of described content;Analytic unit 23, for according to described spam degree parameter, determining whether described content leads to
Cross examination & verification.
It should be noted that the examination & verification device of the content that provided of the present embodiment partly or entirely can be for being located locally
The application of terminal, or can also be plug-in unit or the SDK in the application be arranged in local terminal
Functional units such as (software development kit, sdk), or can also be the process in network side server
Engine, or can also be the distributed system positioned at network side, the present embodiment is not particularly limited to this.
It is understood that described application can be mounted in the local program (nativeapp) in terminal, or also may be used
To be a web page program (webapp) of browser in terminal, the present embodiment is not particularly limited to this.
Alternatively, in a possible implementation of the present embodiment, described taxon 23, can also use further
If comprising described sensitivity vocabulary in described content, described content is carried out reporting process, for manually being examined to described content
Whether core is processed, determine described content by examination & verification.
Further, described taxon 22, can also be further used for manually examining described content and described content
Core result, as training sample;And utilize described training sample, update described disaggregated model.
Alternatively, in a possible implementation of the present embodiment, described analytic unit 23, specifically can be used for institute
Stating analytic unit 23, if specifically can be used for described spam degree parameter to be less than or equal to the parameter threshold pre-setting, determining institute
State content and pass through examination & verification;If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
Alternatively, in a possible implementation of the present embodiment, described analytic unit 23, if specifically can be used for
Described spam degree parameter is less than or equal to the parameter threshold pre-setting, and determines that described content passes through examination & verification;If described spam degree
Parameter is more than described parameter threshold, and described user, in described blacklist, determines described content not over examination & verification;If described
Spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is carried out reporting process,
For carrying out manual examination and verification process to described content, determine described content whether by examination & verification.
Further, described analytic unit 23, can also be further used for manually examining described content and described content
Core result, as training sample;And utilize described training sample, update described disaggregated model.
It should be noted that method in the corresponding embodiment of Fig. 1, the examination & verification device of the content that can be provided by the present embodiment
Realize.Describe the related content that may refer in the corresponding embodiment of Fig. 1 in detail, here is omitted.
In the present embodiment, user's content to be committed is obtained by acquiring unit, whether comprises pre- in the described content of detection
The sensitive vocabulary first arranging, if the described content of taxon does not comprise the sensitive vocabulary pre-setting, using disaggregated model, to institute
State content to be processed, with obtain the spam degree parameter of described content so that analytic unit can according to described spam degree parameter,
Determine that described content, whether by examination & verification, manually need not participate in review process, simple to operate, and accuracy is high, thus improve
The efficiency of content auditing and reliability.
In addition, adopting technical scheme provided by the present invention, being capable of significant increase Consumer's Experience.
Described above illustrate and describes some preferred embodiments of the application, but as previously mentioned it should be understood that the application
Be not limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and can be used for various other combinations,
Modification and environment, and can be in invention contemplated scope described herein, by technology or the knowledge of above-mentioned teaching or association area
It is modified.And the change that those skilled in the art are carried out and change without departing from spirit and scope, then all should be in this Shen
Please be in the protection domain of claims.
Claims (10)
1. a kind of checking method of content is it is characterised in that include:
Obtain user's content to be committed, detect the sensitive vocabulary whether comprising in described content to pre-set;
If described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, described content is processed, to obtain
The spam degree parameter of described content;
According to described spam degree parameter, determine described content whether by examination & verification.
2. method according to claim 1 is it is characterised in that after the pending content of described acquisition, also include:
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out manually to described content
Whether examination & verification is processed, determine described content by examination & verification.
3. method according to claim 1 and 2 it is characterised in that described according to described spam degree parameter, determine described interior
Whether hold by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
4. method according to claim 1 and 2 it is characterised in that described according to described spam degree parameter, determine described interior
Whether hold by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines that described content does not have
Have by examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is entered
Whether row reports process, for carrying out manual examination and verification process to described content, determine described content by examination & verification.
If 5. method according to claim 4 is it is characterised in that described spam degree parameter is more than described parameter threshold
It is worth, and described user not in described blacklist, described content is carried out reporting process, for manually being examined to described content
Whether core is processed, after determining described content by examination & verification, also include:
By the manual examination and verification result of described content and described content, as training sample;
Using described training sample, update described disaggregated model.
6. a kind of examination & verification device of content is it is characterised in that include:
Acquiring unit, the content to be committed for obtaining user, detects the sensitive word whether comprising in described content to pre-set
Converge;
Taxon, if do not comprise the sensitive vocabulary pre-setting for described content, using disaggregated model, enters to described content
Row is processed, to obtain the spam degree parameter of described content;
Whether analytic unit, for according to described spam degree parameter, determining described content by examination & verification.
7. device according to claim 6, it is characterised in that described taxon, is additionally operable to
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out manually to described content
Whether examination & verification is processed, determine described content by examination & verification.
8. the method according to claim 6 or 7 is it is characterised in that described analytic unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
9. the device according to claim 6 or 7 is it is characterised in that described analytic unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines that described content does not have
Have by examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is entered
Whether row reports process, for carrying out manual examination and verification process to described content, determine described content by examination & verification.
10. device according to claim 9 is it is characterised in that described analytic unit, is additionally operable to described content and described
The manual examination and verification result of content, as training sample;And utilize described training sample, update described disaggregated model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610727794.0A CN106372057A (en) | 2016-08-25 | 2016-08-25 | Content auditing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610727794.0A CN106372057A (en) | 2016-08-25 | 2016-08-25 | Content auditing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106372057A true CN106372057A (en) | 2017-02-01 |
Family
ID=57878279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610727794.0A Pending CN106372057A (en) | 2016-08-25 | 2016-08-25 | Content auditing method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106372057A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107612893A (en) * | 2017-09-01 | 2018-01-19 | 北京百悟科技有限公司 | The auditing system and method and structure short message examination & verification model method of short message |
CN108304537A (en) * | 2018-01-30 | 2018-07-20 | 上海康斐信息技术有限公司 | Retain the method and system of user's message |
CN108960782A (en) * | 2018-07-10 | 2018-12-07 | 北京木瓜移动科技股份有限公司 | content auditing method and device |
CN109831751A (en) * | 2019-01-04 | 2019-05-31 | 上海创蓝文化传播有限公司 | A kind of short message content air control system and method based on natural language processing |
CN109918202A (en) * | 2019-03-08 | 2019-06-21 | 上海七牛信息技术有限公司 | Information processing method, device and storage medium |
CN111159354A (en) * | 2019-12-31 | 2020-05-15 | 中国银行股份有限公司 | Sensitive information detection method, device, equipment and system |
CN111611312A (en) * | 2020-05-19 | 2020-09-01 | 四川万网鑫成信息科技有限公司 | Data desensitization method based on rule engine and block chain technology |
CN112232715A (en) * | 2020-11-19 | 2021-01-15 | 湖南红网新媒体集团有限公司 | Volunteer service online management method, system and device |
CN112579771A (en) * | 2020-12-08 | 2021-03-30 | 腾讯科技(深圳)有限公司 | Content title detection method and device |
CN113312449A (en) * | 2021-05-17 | 2021-08-27 | 华南理工大学 | Text auditing method, system and medium based on keywords and deep learning |
CN113360566A (en) * | 2021-08-06 | 2021-09-07 | 成都明途科技有限公司 | Information content monitoring method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101446970A (en) * | 2008-12-15 | 2009-06-03 | 腾讯科技(深圳)有限公司 | Method for censoring and process text contents issued by user and device thereof |
CN201550138U (en) * | 2009-09-10 | 2010-08-11 | 北京盛景无限文化传媒有限公司 | System for providing mobile stream medium service |
CN102098332A (en) * | 2010-12-30 | 2011-06-15 | 北京新媒传信科技有限公司 | Method and device for examining and verifying contents |
-
2016
- 2016-08-25 CN CN201610727794.0A patent/CN106372057A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101446970A (en) * | 2008-12-15 | 2009-06-03 | 腾讯科技(深圳)有限公司 | Method for censoring and process text contents issued by user and device thereof |
CN201550138U (en) * | 2009-09-10 | 2010-08-11 | 北京盛景无限文化传媒有限公司 | System for providing mobile stream medium service |
CN102098332A (en) * | 2010-12-30 | 2011-06-15 | 北京新媒传信科技有限公司 | Method and device for examining and verifying contents |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107612893B (en) * | 2017-09-01 | 2020-06-02 | 北京百悟科技有限公司 | Short message auditing system and method and short message auditing model building method |
CN107612893A (en) * | 2017-09-01 | 2018-01-19 | 北京百悟科技有限公司 | The auditing system and method and structure short message examination & verification model method of short message |
CN108304537A (en) * | 2018-01-30 | 2018-07-20 | 上海康斐信息技术有限公司 | Retain the method and system of user's message |
CN108960782A (en) * | 2018-07-10 | 2018-12-07 | 北京木瓜移动科技股份有限公司 | content auditing method and device |
CN109831751A (en) * | 2019-01-04 | 2019-05-31 | 上海创蓝文化传播有限公司 | A kind of short message content air control system and method based on natural language processing |
CN109918202A (en) * | 2019-03-08 | 2019-06-21 | 上海七牛信息技术有限公司 | Information processing method, device and storage medium |
CN111159354A (en) * | 2019-12-31 | 2020-05-15 | 中国银行股份有限公司 | Sensitive information detection method, device, equipment and system |
CN111611312A (en) * | 2020-05-19 | 2020-09-01 | 四川万网鑫成信息科技有限公司 | Data desensitization method based on rule engine and block chain technology |
CN112232715A (en) * | 2020-11-19 | 2021-01-15 | 湖南红网新媒体集团有限公司 | Volunteer service online management method, system and device |
CN112579771A (en) * | 2020-12-08 | 2021-03-30 | 腾讯科技(深圳)有限公司 | Content title detection method and device |
CN112579771B (en) * | 2020-12-08 | 2024-05-07 | 腾讯科技(深圳)有限公司 | Content title detection method and device |
CN113312449A (en) * | 2021-05-17 | 2021-08-27 | 华南理工大学 | Text auditing method, system and medium based on keywords and deep learning |
CN113360566A (en) * | 2021-08-06 | 2021-09-07 | 成都明途科技有限公司 | Information content monitoring method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106372057A (en) | Content auditing method and apparatus | |
KR101752251B1 (en) | Method and device for identificating a file | |
CN108108902B (en) | Risk event warning method and device | |
CN108121795B (en) | User behavior prediction method and device | |
US10963912B2 (en) | Method and system for filtering goods review information | |
CN108959329B (en) | Text classification method, device, medium and equipment | |
CN108280542A (en) | A kind of optimization method, medium and the equipment of user's portrait model | |
CN110610193A (en) | Method and device for processing labeled data | |
CN112860841A (en) | Text emotion analysis method, device and equipment and storage medium | |
CN106960248B (en) | Method and device for predicting user problems based on data driving | |
CN107305575A (en) | The punctuate recognition methods of human-machine intelligence's question answering system and device | |
CN111460250A (en) | Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus | |
CN104216876A (en) | Informative text filter method and system | |
CN104915356A (en) | Text classification correcting method and device | |
CN104142912A (en) | Accurate corpus category marking method and device | |
CN109508373A (en) | Calculation method, equipment and the computer readable storage medium of enterprise's public opinion index | |
CN108280164A (en) | A kind of short text filtering and sorting technique based on classification related words | |
CN107491536A (en) | A kind of examination question method of calibration, examination question calibration equipment and electronic equipment | |
CN109902157A (en) | A kind of training sample validation checking method and device | |
CN104809104A (en) | Method and system for identifying micro-blog textual emotion | |
CN109858035A (en) | A kind of sensibility classification method, device, electronic equipment and readable storage medium storing program for executing | |
CN104850540A (en) | Sentence recognizing method and sentence recognizing device | |
CN103886097A (en) | Chinese microblog viewpoint sentence recognition feature extraction method based on self-adaption lifting algorithm | |
CN109933775B (en) | UGC content processing method and device | |
CN105786929A (en) | Information monitoring method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170201 |