CN111770352B - Security detection method and device, electronic equipment and storage medium - Google Patents

Security detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111770352B
CN111770352B CN202010589220.8A CN202010589220A CN111770352B CN 111770352 B CN111770352 B CN 111770352B CN 202010589220 A CN202010589220 A CN 202010589220A CN 111770352 B CN111770352 B CN 111770352B
Authority
CN
China
Prior art keywords
live broadcast
time period
detected
broadcast room
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010589220.8A
Other languages
Chinese (zh)
Other versions
CN111770352A (en
Inventor
周杰
王鸣辉
孙振邦
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Beijing Volcano Engine Technology Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010589220.8A priority Critical patent/CN111770352B/en
Publication of CN111770352A publication Critical patent/CN111770352A/en
Application granted granted Critical
Publication of CN111770352B publication Critical patent/CN111770352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2407Monitoring of transmitted content, e.g. distribution time, number of downloads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44204Monitoring of content usage, e.g. the number of times a movie has been viewed, copied or the amount which has been watched
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Alarm Systems (AREA)

Abstract

The present disclosure provides a security detection method, apparatus, electronic device and storage medium, the method comprising: acquiring initial live broadcast feature vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions; determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to a to-be-detected live broadcast room through an attention network on the basis of the initial live broadcast feature vector in each time period; based on the weight of each safety feature dimension, adjusting the initial live broadcast feature vector of the live broadcast room to be detected in each time period to obtain an adjusted live broadcast feature vector of the live broadcast room to be detected in the time period; and determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room.

Description

Security detection method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a security detection method and apparatus, an electronic device, and a storage medium.
Background
With the continuous development of internet technology, live broadcast technology operates, a live broadcast platform provides a plurality of live broadcast rooms, and a user can watch live broadcast video streams sent by a main broadcast in the current live broadcast room after entering the live broadcast rooms.
In the live broadcast room, a user may issue some inappropriate speeches or make some inappropriate behaviors in the live broadcast room, and in order to manage this, the live broadcast management platform needs to monitor the speeches and behaviors of the user in each live broadcast room so as to achieve management of each live broadcast room.
When the related technology is used for carrying out safety detection on the live broadcasting room, the audio or the live broadcasting picture of the live broadcasting room to be detected can be randomly extracted to carry out safety detection on the live broadcasting environment of the live broadcasting room, and when the mode is used for carrying out safety detection on the live broadcasting room to be detected, the accuracy of the obtained safety detection result is low.
Disclosure of Invention
The embodiment of the disclosure at least provides a security detection method to improve the accuracy of a security detection result for a live broadcast room.
In a first aspect, an embodiment of the present disclosure provides a security detection method, including:
acquiring initial live broadcast feature vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions; determining the weight of each safety characteristic dimension in multiple safety characteristic dimensions corresponding to the to-be-detected live broadcast room through an attention network on the basis of the initial live broadcast characteristic vector in each time period; based on the weight of each safety feature dimension, adjusting the initial live broadcast feature vector of the live broadcast room to be detected in each time period to obtain an adjusted live broadcast feature vector corresponding to the time period of the live broadcast room to be detected; and determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room.
In a possible implementation manner, after determining a security detection result corresponding to the to-be-detected live broadcast room, the security detection method further includes:
and if the safety detection result indicates that the live broadcast content of the to-be-detected live broadcast room does not accord with the preset safety detection condition, outputting the identification corresponding to the to-be-detected live broadcast room and the live broadcast content corresponding to the to-be-detected live broadcast room.
In a possible implementation manner, the obtaining of the initial live broadcast feature vector of the to-be-detected live broadcast room in each of the multiple time periods, which is determined based on the feature extraction networks respectively corresponding to the multiple security feature dimensions, includes:
acquiring live broadcast content of the live broadcast room to be detected in each of a plurality of continuous time periods, wherein the live broadcast content comprises scene pictures and/or audio content, and extracting live broadcast content characteristics corresponding to the live broadcast content through a network based on the characteristics corresponding to the various security characteristic dimensions; and splicing the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected to obtain an initial live broadcast characteristic vector of the live broadcast room to be detected in the time period.
In a possible implementation manner, after obtaining the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period, the security detection method further includes:
identifying live broadcast content characteristics in an initial live broadcast characteristic vector corresponding to each time period of the live broadcast room to be detected, and detecting whether missing values aiming at least one safety characteristic dimension exist in the live broadcast content characteristics; and when determining that the missing value aiming at least one safety characteristic dimension exists, filling the missing value of the at least one safety characteristic dimension based on the characteristic values of other safety characteristic dimensions in the initial live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected.
In a possible implementation manner, the determining, through an attention network, a weight of each security feature dimension of multiple security feature dimensions corresponding to the to-be-detected live broadcast room based on the initial live broadcast feature vector in each of the multiple time periods includes:
determining a target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension based on the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods; and inputting the target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension into a full connection layer and an activation function layer in the attention network to obtain the weight of each safety characteristic dimension in the multiple safety characteristic dimensions corresponding to the to-be-detected live broadcast room.
In a possible implementation manner, the obtaining a target feature value of the to-be-detected live broadcast room in each security feature dimension based on the initial live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room in the multiple time periods includes:
and extracting a target characteristic value under each security characteristic dimension from the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods of the live broadcast room to be detected based on the pooling layer in the attention network.
In a possible implementation manner, the extracting, based on a pooling layer in the attention network, a target feature value in each security feature dimension from an initial live broadcast feature vector corresponding to each of the multiple time periods in the to-be-detected live broadcast room includes:
based on a pooling layer in the attention network, extracting a maximum characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in the multiple time periods in the live broadcast room to be detected as the target characteristic value under the security characteristic dimension.
In a possible implementation manner, the determining, based on the adjusted live broadcast feature vector corresponding to each time period in the to-be-detected live broadcast room, a security detection result corresponding to the to-be-detected live broadcast room includes:
for the adjusted live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room, fusing feature values under different security feature dimensions contained in the adjusted live broadcast feature vector corresponding to the time period to obtain a first fused live broadcast feature vector corresponding to the time period of the to-be-detected live broadcast room; determining a fusion live broadcast feature vector corresponding to the to-be-detected live broadcast room based on a first fusion live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room; and determining a safety detection result corresponding to the to-be-detected live broadcast room based on the fusion live broadcast feature vector.
In a possible implementation manner, the determining a fusion live broadcast feature vector corresponding to the to-be-detected live broadcast room based on a first fusion live broadcast feature vector corresponding to each time period in the to-be-detected live broadcast room includes:
from a non-first time period in the time periods, fusing a first fused live broadcast feature vector corresponding to a current time period and a memory live broadcast feature vector corresponding to a previous time period of the current time period to obtain a second fused live broadcast feature vector corresponding to the current time period of the to-be-detected live broadcast room; extracting a memory live broadcast feature vector corresponding to the current time period based on a second fusion live broadcast feature vector corresponding to the current time period, and fusing the memory live broadcast feature vector corresponding to the current time period and a first fusion live broadcast feature vector corresponding to the next time period of the current time period to obtain a second fusion live broadcast feature vector corresponding to the next time period of the to-be-detected live broadcast room; and judging whether the next time period is the last time period in the time periods, if so, taking a second fusion live broadcast feature vector corresponding to the next time period as a fusion live broadcast feature vector corresponding to the live broadcast room to be detected, and if not, taking the next time period as the current time period, and executing the step of determining the second fusion live broadcast feature vector corresponding to the current time period in the live broadcast room to be detected.
In one possible embodiment, the safety detection result is realized by a pre-trained neural network containing an attention network;
the neural network is obtained by training initial live broadcast feature vectors corresponding to each sample live broadcast room in a plurality of time periods and safety detection results corresponding to each pre-labeled sample live broadcast room.
In one possible embodiment, the neural network is trained in the following manner:
acquiring initial live broadcast feature vectors of each sample live broadcast room in each time period in a plurality of time periods, which are determined based on the feature extraction networks respectively corresponding to the multiple security feature dimensions; determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room through the attention network based on the initial live broadcast feature vector of each sample live broadcast room in each time period in multiple time periods; adjusting the initial live broadcast feature vector of each sample live broadcast room in each time period based on the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room to obtain an adjusted live broadcast feature vector of the sample live broadcast room in the time period; predicting a safety detection result corresponding to the sample live broadcast room based on the adjusted live broadcast feature vector corresponding to each time period of the sample live broadcast room; and adjusting the network parameter values in the neural network based on the predicted safety detection result corresponding to each sample live broadcast room and the actual safety detection result corresponding to the sample live broadcast room.
In a second aspect, an embodiment of the present disclosure provides a security detection apparatus, including:
the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring initial live broadcast characteristic vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on characteristic extraction networks respectively corresponding to a plurality of security characteristic dimensions; a first determining module, configured to determine, based on an initial live broadcast feature vector in each of the multiple time periods, a weight of each of multiple security feature dimensions corresponding to the to-be-detected live broadcast room through an attention network; the adjusting module is used for adjusting the initial live broadcast characteristic vector of the live broadcast room to be detected in each time period based on the weight of each safety characteristic dimension to obtain an adjusted live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected; and the second determining module is used for determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room.
In a possible implementation manner, after determining a security detection result corresponding to the to-be-detected live broadcast room, the second determining module is further configured to:
and if the safety detection result indicates that the live broadcast content of the to-be-detected live broadcast room does not accord with the preset safety detection condition, outputting the identification corresponding to the to-be-detected live broadcast room and the live broadcast content corresponding to the to-be-detected live broadcast room.
In a possible implementation manner, when the obtaining module is configured to obtain an initial live broadcast feature vector of a to-be-detected live broadcast room in each of a plurality of time periods, where the initial live broadcast feature vector is determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions, the obtaining module includes:
acquiring live broadcast content of the live broadcast room to be detected in each of a plurality of continuous time periods, wherein the live broadcast content comprises scene pictures and/or audio content, and extracting live broadcast content characteristics corresponding to the live broadcast content through a network based on the characteristics corresponding to the various security characteristic dimensions; and splicing the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected to obtain an initial live broadcast characteristic vector of the live broadcast room to be detected in the time period.
In a possible implementation manner, after obtaining the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period, the obtaining module is further configured to:
identifying live broadcast content characteristics in an initial live broadcast characteristic vector corresponding to each time period of the live broadcast room to be detected, and detecting whether missing values aiming at least one safety characteristic dimension exist in the live broadcast content characteristics; and when determining that the missing value aiming at least one safety characteristic dimension exists, filling the missing value of the at least one safety characteristic dimension based on the characteristic values of other safety characteristic dimensions in the initial live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected.
In a possible implementation manner, when the first determining module is configured to determine, through an attention network, a weight of each security feature dimension of multiple security feature dimensions corresponding to the to-be-detected live broadcast room based on the initial live broadcast feature vector in each of the multiple time periods, the determining module includes:
determining a target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension based on the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods; and inputting the target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension into a full connection layer and an activation function layer in the attention network to obtain the weight of each safety characteristic dimension in the multiple safety characteristic dimensions corresponding to the to-be-detected live broadcast room.
In a possible implementation manner, when the first determining module is configured to obtain a target feature value of the to-be-detected live broadcast room in each security feature dimension based on an initial live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room in the multiple time periods, the first determining module includes:
and extracting a target characteristic value under each security characteristic dimension from the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods of the live broadcast room to be detected based on the pooling layer in the attention network.
In a possible implementation manner, the first determining module, when configured to extract, from an initial live broadcast feature vector corresponding to each of the multiple time periods in the live broadcast room to be detected based on the pooling layer in the attention network, a target feature value in each security feature dimension, includes:
based on a pooling layer in the attention network, extracting a maximum characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in the multiple time periods in the live broadcast room to be detected as the target characteristic value under the security characteristic dimension.
In a possible implementation manner, when the second determining module is configured to determine a security detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast feature vector corresponding to the to-be-detected live broadcast room in each time period, the second determining module includes:
for the adjusted live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room, fusing feature values under different security feature dimensions contained in the adjusted live broadcast feature vector corresponding to the time period to obtain a first fused live broadcast feature vector corresponding to the time period of the to-be-detected live broadcast room; determining a fusion live broadcast feature vector corresponding to the to-be-detected live broadcast room based on a first fusion live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room; and determining a safety detection result corresponding to the to-be-detected live broadcast room based on the fusion live broadcast feature vector.
In a possible implementation manner, when the second determining module is configured to determine the fused live broadcast feature vector corresponding to the to-be-detected live broadcast room based on the first fused live broadcast feature vector corresponding to the to-be-detected live broadcast room in each time period, the second determining module includes:
from a non-first time period in the time periods, fusing a first fused live broadcast feature vector corresponding to a current time period and a memory live broadcast feature vector corresponding to a previous time period of the current time period to obtain a second fused live broadcast feature vector corresponding to the current time period of the to-be-detected live broadcast room; extracting a memory live broadcast feature vector corresponding to the current time period based on a second fusion live broadcast feature vector corresponding to the current time period, and fusing the memory live broadcast feature vector corresponding to the current time period and a first fusion live broadcast feature vector corresponding to the next time period of the current time period to obtain a second fusion live broadcast feature vector corresponding to the next time period of the to-be-detected live broadcast room; and judging whether the next time period is the last time period in the time periods, if so, taking a second fusion live broadcast feature vector corresponding to the next time period as a fusion live broadcast feature vector corresponding to the live broadcast room to be detected, and if not, taking the next time period as the current time period, and executing the step of determining the second fusion live broadcast feature vector corresponding to the current time period in the live broadcast room to be detected.
In a possible implementation, the security detection apparatus further includes a network training module, where the network training module is configured to train a neural network that determines the security detection result, and the neural network includes an attention network; the neural network is obtained by training initial live broadcast feature vectors corresponding to each sample live broadcast room in a plurality of time periods and safety detection results corresponding to each pre-labeled sample live broadcast room.
In one possible embodiment, the network training module is configured to train the neural network in the following manner:
acquiring initial live broadcast feature vectors of each sample live broadcast room in each time period in a plurality of time periods, which are determined based on the feature extraction networks respectively corresponding to the multiple security feature dimensions; determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room through the attention network based on the initial live broadcast feature vector of each sample live broadcast room in each time period in multiple time periods; adjusting the initial live broadcast feature vector of each sample live broadcast room in each time period based on the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room to obtain an adjusted live broadcast feature vector of the sample live broadcast room in the time period; predicting a safety detection result corresponding to the sample live broadcast room based on the adjusted live broadcast feature vector corresponding to each time period of the sample live broadcast room; and adjusting the network parameter values in the neural network based on the predicted safety detection result corresponding to each sample live broadcast room and the actual safety detection result corresponding to the sample live broadcast room.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the security detection method according to the first aspect.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the security detection method according to the first aspect.
The utility model provides a safety detection method, wherein the initial live broadcast characteristic vector of the live broadcast room to be detected in each time quantum in a plurality of time quantum is determined by obtaining the characteristic extraction network corresponding to a plurality of safety characteristic dimensions, for example, the initial live broadcast characteristic vector of the live broadcast room to be detected in a plurality of safety characteristic dimensions in each time quantum in ten time quantum is obtained, then the weight of each safety characteristic dimension in a plurality of safety characteristic dimensions corresponding to the live broadcast room to be detected is determined by the attention network, thus the importance of the characteristic value corresponding to each safety characteristic dimension in the initial live broadcast characteristic vector of the live broadcast room to be detected in each time quantum can be determined, then based on the importance, the initial live broadcast characteristic vector of the live broadcast room to be detected in each time quantum is adjusted to obtain the adjusted live broadcast characteristic vector of the live broadcast room to be detected in each time quantum, through adjusting initial live characteristic vector, for example through the proportion that increases the eigenvalue under the important security characteristic dimension, the proportion of the eigenvalue under the non-important characteristic dimension weakens to the security testing result that should wait to detect live room correspondence that can be more accurate obtains, thereby is convenient for control the live broadcast through this security testing result in the later stage, effectively improves live broadcast environment.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flow chart of a security detection method provided by an embodiment of the present disclosure;
fig. 2 shows a flowchart of a method for determining a security detection result corresponding to a to-be-detected live broadcast room according to an embodiment of the present disclosure;
fig. 3 shows a flowchart of a method for determining a fused live broadcast feature vector corresponding to a live broadcast room to be detected according to an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating a specific process of a security detection result provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart of a method for training a neural network provided by an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram illustrating a safety detection device provided in an embodiment of the present disclosure;
fig. 7 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In a live broadcast scene, such as a live broadcast room, a user may make some inappropriate speech or some inappropriate behavior in the live broadcast room, and in order to manage this, the live broadcast management platform needs to perform security detection on the speech and behavior of the user in each live broadcast room so as to achieve management of each live broadcast room.
The security detection mode in the related art is usually a simpler and mechanical mode, for example, the security detection is performed on the live broadcast room along with the extraction of the audio or the live broadcast picture of the live broadcast room to be detected, and when the security detection is performed on the live broadcast room to be detected based on the mode, the accuracy of the obtained security detection result is lower.
Based on the above research, the present disclosure provides a security detection method, where an initial live broadcast feature vector of a live broadcast room to be detected in each of multiple time periods determined by a feature extraction network corresponding to each of multiple security feature dimensions is obtained, for example, an initial live broadcast feature vector of the live broadcast room to be detected in each of ten time periods in each of the multiple security feature dimensions is obtained, then a weight of each security feature dimension in the multiple security feature dimensions corresponding to the live broadcast room to be detected is determined by an attention network, so that importance of a feature value corresponding to each security feature dimension in the initial live broadcast feature vector of the live broadcast room to be detected in each time period can be determined, based on this, the initial live broadcast feature vector of the live broadcast room to be detected in each time period is adjusted to obtain an adjusted live broadcast feature vector of the live broadcast room to be detected in each time period, through adjusting initial live characteristic vector, for example through the proportion that increases the eigenvalue under the important security characteristic dimension, the proportion of the eigenvalue under the non-important characteristic dimension weakens to the security testing result that should wait to detect live room correspondence that can be more accurate obtains, thereby is convenient for control the live broadcast through this security testing result in the later stage, effectively improves live broadcast environment.
To facilitate understanding of the present embodiment, first, a security detection method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the security detection method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a server or other processing device. In some possible implementations, the security detection method may be implemented by a processor calling computer readable instructions stored in a memory.
Referring to fig. 1, which is a flowchart of a security detection method provided in the embodiment of the present disclosure, the method includes steps S101 to S104, where:
s101, acquiring initial live broadcast feature vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions.
The live broadcast room to be detected may be a virtual room in which live broadcast is performed, for example, a virtual room in which live broadcast is performed by setting a client, and the live broadcast room may correspond to a live broadcast room identifier, which may be a user name, a mobile phone number, or another account number of a user who performs live broadcast.
The plurality of time periods may be continuous time periods, where the first time period may be a set time length from a set time, for example, the set time is 1min after the live broadcast starts in a live broadcast room, each time period includes a set time length of 20s, if the live broadcast start time in the live broadcast room is 9:00, the first time period may be 9:01:00 to 9:01:20, and the second time period may be 9:01:20 to 9:01: 40.
The security feature dimension may include multiple dimensions for evaluating the live broadcast effect, and particularly may include a feature dimension for evaluating whether a live broadcast has a risk, for example, the feature dimension may include dimensions for evaluating whether an anchor broadcast in a live broadcast room has an inappropriate behavior, an inappropriate statement, whether the live broadcast room is reported in a history stage, the number of times of reporting, and attribute features of the anchor broadcast, and based on feature extraction networks respectively corresponding to the multiple security feature dimensions, an initial live broadcast feature vector corresponding to a live broadcast room to be detected in multiple time periods may be obtained.
Here, the feature extraction networks respectively corresponding to the multiple security feature dimensions may include a feature extraction network for behavior risk feature extraction, a feature extraction network for bullet screen risk feature extraction, a feature extraction network for anchor speech risk feature extraction, and the like, and a manner of determining the initial live broadcast feature vector through the feature extraction network will be specifically described later, which is not described herein in detail.
S102, determining the weight of each safety characteristic dimension in multiple safety characteristic dimensions corresponding to the to-be-detected live broadcast room through an attention network based on the initial live broadcast characteristic vector in each time period.
The initial live broadcast feature vector may include a feature value corresponding to each security feature dimension, and here, the weight of each security feature dimension in the multiple security feature dimensions corresponding to the live broadcast room to be detected is determined through the attention network, and is also the weight of the feature value corresponding to each security feature dimension in the initial live broadcast feature vector.
S103, based on the weight of each safety feature dimension, adjusting the initial live broadcast feature vector of the live broadcast room to be detected in each time period to obtain an adjusted live broadcast feature vector corresponding to the time period of the live broadcast room to be detected.
Here, when the initial live broadcast feature vector of the live broadcast room to be detected in each time period is adjusted based on the weight of each security feature dimension, the weight of each security feature dimension may be multiplied by a feature value of the security feature dimension corresponding to the initial live broadcast feature vector to obtain a feature value of the security feature dimension corresponding to the adjusted live broadcast feature vector.
And S104, determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room.
The safety detection result here may include whether a risk exists in the to-be-detected live broadcast room, specifically may be represented by a risk score, and may also be represented by a risk level, for example, a risk score corresponding to the to-be-detected live broadcast room may be determined by adjusting a live broadcast feature vector, specifically, the risk score may be divided into 0 to 1, a risk score threshold may be preset, for example, 0.7 is set as the risk score threshold, that is, when the risk score corresponding to the to-be-detected live broadcast room reaches 0.7 time, it may be determined that the to-be-detected live broadcast room has a high risk possibility; or the security detection result here may also be represented by a risk level, for example, a risk score corresponding to the live broadcast room to be detected may be determined first by using the adjusted live broadcast feature vector, and then a risk level corresponding to the risk score is determined based on the risk score, for example, 0 to 0.4 of the risk score belongs to a low risk level, 0.4 to 0.7 of the risk score belongs to a medium risk level, and 0.7 to 1 of the risk score belongs to a high risk level, so that if the risk level corresponding to the live broadcast room to be detected is obtained as the high risk level, it may be determined that the risk of the live broadcast room to be detected is high.
The safety detection method provided in S101 to S104 includes obtaining initial live broadcast feature vectors of a live broadcast room to be detected in each of multiple time periods, which are determined by a feature extraction network corresponding to each of multiple safety feature dimensions, for example, obtaining initial live broadcast feature vectors of the live broadcast room to be detected in each of ten time periods in the multiple safety feature dimensions, then determining a weight of each safety feature dimension of the multiple safety feature dimensions corresponding to the live broadcast room to be detected by an attention network, so as to determine importance of feature values corresponding to each safety feature dimension in the initial live broadcast feature vectors of the live broadcast room to be detected in each time period, and then based on the importance, adjusting the initial live broadcast feature vectors of the live broadcast room to be detected in each time period, obtaining adjusted live broadcast feature vectors corresponding to each time period of the live broadcast room to be detected, through adjusting initial live characteristic vector, for example through the proportion that increases the eigenvalue under the important security characteristic dimension, the proportion of the eigenvalue under the non-important characteristic dimension weakens to the security testing result that should wait to detect live room correspondence that can be more accurate obtains, thereby is convenient for control the live broadcast through this security testing result in the later stage, effectively improves live broadcast environment.
In an implementation manner, after determining a security detection result corresponding to a to-be-detected live broadcast room, the security detection method provided in the embodiment of the present disclosure further includes:
and if the safety detection result indicates that the live broadcast content of the live broadcast room to be detected does not accord with the preset safety detection condition, outputting the identification corresponding to the live broadcast room to be detected and the live broadcast content corresponding to the live broadcast room to be detected.
Specifically, the safety detection condition corresponds to the safety detection result, for example, if the safety detection result is represented by a risk score, the preset safety condition is that the risk score is lower than a risk score threshold, and if the safety detection result is represented by a risk level, the preset safety condition may be that the risk level is lower than a high risk level.
The identification of the to-be-detected live broadcast room can be the to-be-detected live broadcast room ID, the live broadcast content corresponding to the to-be-detected live broadcast room can comprise live broadcast pictures and/or live broadcast audio corresponding to the to-be-detected live broadcast room in a plurality of continuous time periods, the identification of the to-be-detected live broadcast room and the live broadcast content corresponding to the to-be-detected live broadcast room are output and can be output to a client side corresponding to an operator, and the operator can further perform security detection on the to-be-detected live broadcast pictures and/or the live broadcast audio corresponding to the plurality of continuous time periods based on the to-be-detected live broadcast room.
For the above S101, when obtaining the initial live broadcast feature vector of the to-be-detected live broadcast room in each of the multiple time periods, which is determined based on the feature extraction networks respectively corresponding to the multiple security feature dimensions, the method may include:
(1) acquiring live broadcast content of a to-be-detected live broadcast room in each of a plurality of continuous time periods, wherein the live broadcast content comprises scene pictures and/or audio content, and extracting live broadcast content characteristics corresponding to the live broadcast content through a network based on characteristics corresponding to various security characteristic dimensions;
(2) and splicing the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected to obtain an initial live broadcast characteristic vector of the live broadcast room to be detected in the time period.
When acquiring the live broadcast picture in each time slot of a continuous multiple time slots of the live broadcast room to be detected, the live broadcast picture corresponding to each time slot can be obtained in a manner of extracting one frame of live broadcast picture at set time intervals, for example, for the time slot of 0-20 s, 10 frames of live broadcast pictures corresponding to the time slot of 0-20 s can be obtained in a manner of extracting one frame of live broadcast picture at 2s intervals.
When the live audio of the to-be-detected live broadcast room in each of the continuous multiple time periods is obtained, the live audio corresponding to each time period can be obtained according to a mode of extracting the audio with a set duration in each time period, for example, the live audio of 10 seconds is extracted for a time period of 0 to 20s, and the live audio corresponding to the time period of 0 to 20s is obtained.
After the live content corresponding to each time period is obtained, the network can be extracted according to the features corresponding to the pre-trained multiple security feature dimensions, so as to obtain the live content features corresponding to each time period.
For example, live broadcast content is respectively output to different feature extraction networks, live broadcast content features corresponding to the live broadcast content in each security dimension can be obtained, and particularly, for the situation that multiple frames of live broadcast pictures correspond to the same time period, when live broadcast content features corresponding to a certain security feature dimension corresponding to the time period are determined, each frame of live broadcast picture can be respectively input to the feature extraction network corresponding to the security feature dimension, so that scores corresponding to each frame of live broadcast picture in the security feature dimension are obtained, and then, in the scores corresponding to each frame of live broadcast picture, the highest score is selected as the live broadcast content feature of the time period in the security feature dimension.
Besides the obtained live broadcast content characteristics, historical behavior characteristics and user attribute characteristics corresponding to each time period of a live broadcast room to be detected can be obtained, wherein the historical behavior characteristics can include the times of reporting of behavior violation or language violation in the historical time period of the live broadcast room, the user attribute characteristics can include the characteristics of gender, age and the like of the anchor broadcast of the live broadcast room to be detected, and the user attribute characteristics can also influence the security detection result, for example, the risk of fighting a shelf between live broadcast rooms of young women is low based on the display of big data statistical results, and the risk of fighting a shelf between live broadcast rooms to be detected is low when the included user attribute characteristics are young women.
Further, after acquiring the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected, the live broadcast content characteristics, the historical behavior characteristics and the user attribute characteristics are spliced, so that the initial live broadcast feature vector of the live broadcast room to be detected in the time period can be acquired, for example, the corresponding live broadcast content characteristics acquired in each time period of the live broadcast room to be detected are 100, the historical behavior characteristics corresponding to the time period are 15, and the user attribute characteristics corresponding to the time period are 17, so that the initial live broadcast feature vector of the live broadcast room to be detected in each time period has 132 feature values.
In the embodiment of the disclosure, live content characteristics corresponding to live content under preset multiple live content security characteristic dimensions are extracted, live content characteristics corresponding to each time period in a live room to be detected are further spliced with historical behavior characteristics and user attribute characteristics corresponding to the time period in the live room to be detected, a multi-angle live characteristic vector representing whether the live room to be detected is safe or not can be obtained, the live room to be detected is monitored from multiple angles, and whether safety problems exist in the live room to be detected or not is effectively monitored in a later period.
Considering that when at least one feature extraction network exists in the feature extraction networks respectively corresponding to the multiple security feature dimensions to extract the live broadcast content features corresponding to the live broadcast content, if a fault occurs, the live broadcast content features corresponding to the feature extraction network cannot be obtained, and in this case, missing feature values exist in the initial live broadcast feature vectors of the to-be-detected live broadcast room in each time period in the multiple time periods.
For this reason, after obtaining the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period, the security detection method provided by the embodiment of the present disclosure further includes:
(1) identifying live broadcast content characteristics in an initial live broadcast characteristic vector corresponding to each time period of a to-be-detected live broadcast room, and detecting whether missing values aiming at least one safety characteristic dimension exist in the live broadcast content characteristics;
(2) and when determining that the missing value aiming at least one safety characteristic dimension exists, filling the missing value of at least one safety characteristic dimension based on the characteristic values of other safety characteristic dimensions in the initial live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected.
After the live broadcast content is input into the feature extraction networks respectively corresponding to multiple security feature dimensions, if a certain feature extraction network has a fault and does not obtain the corresponding live broadcast content features, the missing live broadcast content features can be filled according to a preset missing value, for example, the missing value is represented by "-1", so that for the initial live broadcast feature vector corresponding to each time period in the to-be-detected live broadcast room, whether "-1" exists or not can be searched in the initial live broadcast feature vector, if yes, the live broadcast content features under the security feature dimension corresponding to the position of "-1" are the missing value, and then the missing value can be filled according to the feature values of other security feature dimensions.
For example, missing value filling can be performed according to a pre-trained missing value filling network layer, for an initial live broadcast feature vector corresponding to a to-be-detected live broadcast room, a certain incidence relation exists between feature values of different dimensions, and the missing value filling network layer can fill the missing value according to feature values of other security feature dimensions based on the incidence relation.
Specifically, in the embodiment of the present disclosure, the missing value can be filled up by completing the network layer with the missing value in the pre-trained neural network, and in the following, taking table 1 as an example, how to adjust and fill up the missing value in the initial live broadcast feature vector of the live broadcast room to be detected is described, taking the live broadcast room to be detected as the live broadcast room 1 as an example, to explain:
Figure DEST_PATH_IMAGE001
the first feature value sequence in table 1 may be each feature value in an initial live broadcast feature vector corresponding to a certain time period in the live broadcast room 1, and it may be detected that a position corresponding to the feature 2 is a missing value, at this time, the missing value is assigned to a first preset value, for example, assigned to "1", other features are assigned to a second preset value, for example, assigned to "0", and then a given feature value sequence is obtained, the feature values of the sequence are mapped through a full connection layer, a mapping initial value is obtained, for example, the mapping initial value corresponding to the feature 2 is 0.5, the mapping initial values corresponding to the other features are 0, and then the obtained mapping initial value and the original first feature value of the corresponding position are combined to form a second feature value, as shown in table 1, that the missing value is filled, and the initial live broadcast feature vector corresponding to the time period in the live broadcast room 1 is obtained.
In another embodiment, when it is determined that a missing value for at least one security feature dimension exists in any time period of a to-be-detected live broadcast room, for the missing value of each security feature dimension, the missing value of the security dimension may be filled based on feature values of the security feature dimension corresponding to multiple time periods, for example, the missing value of the to-be-detected live broadcast room in a first time period of consecutive multiple time periods in a first security feature dimension exists, feature values of the first security feature dimension corresponding to the multiple time periods may be averaged, and the missing value is filled based on the average value.
In the embodiment of the disclosure, through carrying out missing value filling on the initial live broadcast eigenvector, the initial live broadcast eigenvector containing the complete eigenvalue can be obtained, and the security monitoring can be carried out on the live broadcast room to be detected from a plurality of angles based on the complete initial live broadcast eigenvector, so that the security detection result for the live broadcast room to be detected can be accurately obtained in the later stage.
For the above S102, when determining, through the attention network, a weight of each security feature dimension of multiple security feature dimensions corresponding to the to-be-detected live broadcast room based on the initial live broadcast feature vector in each of the multiple time periods, the determining may include:
(1) determining a target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension based on an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods of the to-be-detected live broadcast room;
(2) and inputting the target characteristic value of the to-be-detected live broadcast room under each security characteristic dimension into a full connection layer and an activation function layer in the attention network to obtain the weight of each security characteristic dimension in the multiple security characteristic dimensions corresponding to the to-be-detected live broadcast room.
In order to determine the weight of each security feature dimension in multiple security feature dimensions of a to-be-detected live broadcast room, the method and the device for determining the weight of each security feature dimension in multiple time periods need to convert an initial live broadcast feature vector corresponding to each time period in multiple time periods into one live broadcast feature vector, and in the conversion process, a target feature value corresponding to each security feature dimension can be determined based on a feature value corresponding to each security feature dimension in multiple time periods, for example, a maximum feature value in the feature values corresponding to the multiple time periods can be used as a target feature value, and an average value of the feature values corresponding to the multiple time periods can also be used as the target feature value.
In the embodiment of the disclosure, a representative target characteristic value under each security characteristic dimension can be selected, a plurality of initial live broadcast characteristic vectors are simplified into a group of target characteristic values with the same number of security characteristic dimensions, and then when the weight of each security characteristic dimension in the plurality of security characteristic dimensions is determined, the efficiency can be improved.
Further, when obtaining a target characteristic value of the to-be-detected live broadcast room in each security characteristic dimension based on the initial live broadcast characteristic vector corresponding to each time slot of the to-be-detected live broadcast room in the multiple time slots, the method may include:
based on a pooling layer in the attention network, extracting a target characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods in a live broadcast room to be detected.
The attention network may include a full connection layer and an activation function layer, and may further include a pooling layer, where after an initial live broadcast feature vector corresponding to each time period in a plurality of time periods of the live broadcast room to be detected is input into the pooling layer in the attention network, the pooling layer may extract a target feature value in each security feature dimension from the initial live broadcast feature vector corresponding to each time period, for example, the initial live broadcast feature vector corresponding to each time period of the live broadcast room to be detected includes a feature value in 132 dimensions, and after the initial live broadcast feature vector is processed by the pooling layer, the target feature value in each dimension may be obtained, for example, the target feature value in 132 dimensions may be obtained here.
And then sequentially inputting the obtained target characteristic value under each security characteristic dimension into a full connection layer and an activation function Sigmoid layer in the attention network, so as to obtain the weight of each security characteristic dimension in the multiple security characteristic dimensions.
After the weight of each security feature dimension corresponding to the to-be-detected live broadcast room is obtained, the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period can be adjusted, and the adjusted live broadcast feature vector of the to-be-detected live broadcast room in the time period is obtained.
In the embodiment of the disclosure, a target characteristic value under each security dimension is extracted through a pooling layer in an attention network trained in advance, and a plurality of target characteristic values can be obtained by inputting an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods in a live broadcast room to be detected into the pooling layer according to the method.
Further, when extracting a target feature value under each security feature dimension from an initial live broadcast feature vector corresponding to each time period in a plurality of time periods in a live broadcast room to be detected based on a pooling layer in an attention network, the method includes:
based on a pooling layer in the attention network, extracting a maximum characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods in a live broadcast room to be detected as a target characteristic value under the security characteristic dimension.
In the embodiment of the disclosure, the maximum characteristic value under each security characteristic dimension is provided through a pooling layer in the attention network, only the maximum value is extracted in the extraction process, other security characteristic dimensions are not involved, and less noise is introduced in the extraction process, so that the obtained target characteristic value is more accurate when the weight of each security characteristic dimension in the multiple security characteristic dimensions corresponding to the to-be-detected live broadcast room is determined.
In addition, based on a pooling layer in the attention network, an average characteristic value under each security characteristic dimension can be extracted from an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods of a live broadcast room to be detected and used as a target characteristic value under the security characteristic dimension.
For the above S104, when determining the security detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast feature vector corresponding to the to-be-detected live broadcast room in each time period, as shown in fig. 2, the following S201 to S203 may be included:
s201, aiming at the adjusted live broadcast feature vector corresponding to each time period of the live broadcast room to be detected, fusing feature values under different security feature dimensions contained in the adjusted live broadcast feature vector corresponding to the time period to obtain a first fused live broadcast feature vector corresponding to the time period of the live broadcast room to be detected;
s202, determining a fusion live broadcast feature vector corresponding to the live broadcast room to be detected based on a first fusion live broadcast feature vector corresponding to each time period of the live broadcast room to be detected;
s203, determining a safety detection result corresponding to the to-be-detected live broadcast room based on the fusion live broadcast feature vector.
After the adjusted live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room is obtained, the adjusted live broadcast feature vector corresponding to each time period can be input into a full connection layer in a pre-trained neural network for processing, so that a first fusion live broadcast feature vector corresponding to the time period of the to-be-detected live broadcast room is obtained, the full connection layer can perform fusion processing on the adjusted live broadcast feature vector corresponding to each time period, for example, feature values under a plurality of security feature dimensions corresponding to the time period are fused, and the first fusion live broadcast feature vector corresponding to the time period is obtained.
After the processing of the full connection layer, the first fused live broadcast feature vector corresponding to each time period of the live broadcast room to be detected may include more feature values than the adjusted live broadcast feature vector corresponding to the time period, for example, the adjusted live broadcast feature vector corresponding to each time period of the live broadcast room to be detected includes 132 feature values, and after the processing of the full connection layer, the obtained first fused live broadcast feature vector corresponding to the time period may include 256 feature values.
In the embodiment of the disclosure, for each time slot, the adjusted live broadcast eigenvectors corresponding to the time slots respectively are fused to obtain a first fused live broadcast eigenvector corresponding to each time slot, and in the fusion process, eigenvalues of different dimensions in the adjusted live broadcast eigenvectors can be fused and updated in combination with the time slots, so that the first fused live broadcast eigenvector more suitable for performing safety monitoring on a to-be-detected live broadcast room is obtained.
After the first fusion live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room is obtained, fusion can be further performed based on the first fusion live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room, so that a fusion live broadcast feature vector corresponding to the to-be-detected live broadcast room is obtained.
When each time period is each time period in a plurality of continuous time periods, the security detection is continuously performed on the live broadcast room to be detected by fusing the first fusion live broadcast feature vectors corresponding to each time period, so that the security detection result corresponding to the live broadcast room to be detected can be more accurately obtained.
Further, for S202, when determining the fused live broadcast feature vector corresponding to the to-be-detected live broadcast room based on the first fused live broadcast feature vector corresponding to the to-be-detected live broadcast room in each time period, as shown in fig. 3, the following S301 to S303 may be included:
s301, starting from a non-first time period in a plurality of time periods, fusing a first fused live broadcast feature vector corresponding to a current time period and a memory live broadcast feature vector corresponding to a previous time period of the current time period to obtain a second fused live broadcast feature vector corresponding to a current time period of a to-be-detected live broadcast room;
s302, extracting a memory live broadcast feature vector corresponding to the current time period based on a second fusion live broadcast feature vector corresponding to the current time period, and fusing the memory live broadcast feature vector corresponding to the current time period and a first fusion live broadcast feature vector corresponding to the next time period of the current time period to obtain a second fusion live broadcast feature vector corresponding to the next time period of the to-be-detected live broadcast room;
and S303, judging whether the next time period is the last time period in the time periods, if so, taking the second fusion live broadcast feature vector corresponding to the next time period as the fusion live broadcast feature vector corresponding to the live broadcast room to be detected, and if not, taking the next time period as the current time period, and executing the step of determining the second fusion live broadcast feature vector corresponding to the current time period in the live broadcast room to be detected.
The process of S301 to S303 is a process of determining the fused live broadcast feature vector corresponding to the live broadcast room to be detected, in the process, a Long-Short Term Memory network (LSTM) is introduced to determine the fused live broadcast feature vector corresponding to each live broadcast room to be detected through a Long-Short Term Memory network in a pre-trained neural network, and for one live broadcast room to be detected, the process of determining the fused live broadcast feature vector corresponding to the live broadcast room to be detected is as follows:
the method comprises the steps of sequentially determining a second fusion live broadcast feature vector corresponding to a current time period from a non-first time period of a plurality of continuous time periods, fusing a first fusion live broadcast feature vector corresponding to the current time period and a long-short term memory network output memory live broadcast feature vector corresponding to a previous time period of the current time period aiming at a long-short term memory network corresponding to the current time period to obtain a second fusion live broadcast feature vector corresponding to the current time period, wherein when the first fusion live broadcast feature vector corresponding to the current time period and the memory live broadcast feature vector of the previous time period are fused, a target feature value to be fused can be fused according to a predetermined fusion weight to obtain the second fusion live broadcast feature vector corresponding to the current time period.
And then, judging whether the current time period is the last time period in a plurality of continuous time periods, and if the current time period is the last time period in the continuous time periods, directly taking the second fusion live broadcast feature vector corresponding to the current time period as the live broadcast feature vector corresponding to the live broadcast room to be detected.
When it is determined that the current time period is not the last time period in the continuous time periods, a memory live broadcast feature vector corresponding to the current time period can be obtained after weights are distributed to feature values in a second fusion live broadcast feature vector corresponding to the current time period according to preset importance degrees, the memory live broadcast feature vector corresponding to the current time period can be output to a long-short term memory network corresponding to the next time period, the memory live broadcast feature vector is used for determining the second fusion live broadcast feature vector corresponding to the next time period together with a first fusion live broadcast feature vector corresponding to the next time period, and then S303 is executed, namely whether the next time period is the process of the last time period in the continuous multiple time periods is judged.
In particular, the second fusion live broadcast feature vector corresponding to the first time period in the continuous multiple time periods can be obtained by fusing the first fusion live broadcast feature vector corresponding to the first time period and a preset memory live broadcast feature vector.
Specifically, before the first merged live broadcast feature vector corresponding to the current time period is input into the long-short term memory network corresponding to the current time period, the first merged live broadcast feature vector may be input into the full connection layer, and after the mapping process of the full connection layer, the first merged live broadcast feature vector may be input into the long-short term memory network corresponding to the current time period.
In the above, for the case that the pre-trained neural network includes one layer of long-short term memory network, when two layers of long-short term memory networks are included, the second fused live broadcast feature vector corresponding to each time period is continuously input into the next layer of long-short term memory network corresponding to the time period, and for the convenience of distinguishing, the first layer of long-short term memory network corresponding to each time period may be referred to as the first long-short term memory network corresponding to the time period, the second layer of long-short term memory network corresponding to each time period may be referred to as the second long-short term memory network corresponding to the time period, and for the second long-short term memory network corresponding to each time period, the second fused live broadcast feature vector corresponding to the time period and the memory live broadcast feature vector output by the second long-short term memory network corresponding to the previous time period of the time period are fused to obtain the third fused live broadcast feature vector corresponding to the time period to be detected, and then judging whether the time period is the last time period in a plurality of continuous time periods, if so, taking a third fused live broadcast feature vector corresponding to the time period as a fused live broadcast feature vector of the live broadcast room to be detected, if not, determining a memory live broadcast feature vector of a second long-short term memory network corresponding to the time period based on the third fused live broadcast feature vector corresponding to the time period, and then fusing the memory live broadcast feature vector corresponding to the time period and a second fused live broadcast feature vector corresponding to the next time period of the time period to obtain a third fused live broadcast feature vector corresponding to the next time period.
In the embodiment of the disclosure, the first fused live broadcast feature vector corresponding to the current time period and the memory live broadcast feature vector corresponding to the last time period of the current time period are fused to finally obtain the second fused live broadcast feature vector corresponding to the last time period in a plurality of continuous time periods, and the live broadcast room to be detected can be continuously monitored by the method, so that the safety detection result corresponding to the live broadcast room to be detected can be more accurately obtained.
After the fused live broadcast feature vector corresponding to the live broadcast room to be detected is obtained, the security detection result corresponding to the live broadcast room to be detected can be determined based on the fused live broadcast feature vector, for example, the security detection result can be determined in the following manner:
inputting the fusion live broadcast feature vector of the live broadcast room to be detected into a recombination layer in a pre-trained neural network for recombination, inputting a recombined result into a full connection layer in the pre-trained neural network for mapping, and then processing a mapping result through a sigmoid activation function to obtain a score capable of representing a safety detection result of the live broadcast room to be detected, wherein the score is expressed through a numerical value between 0 and 1.
The whole process is described in a specific embodiment with reference to fig. 4:
in this embodiment, 512 live broadcast rooms to be detected are monitored simultaneously, 132-dimensional initial live broadcast feature vectors corresponding to the 512 live broadcast rooms to be detected in 10 time periods are firstly acquired, then the 132-dimensional initial live broadcast feature vectors corresponding to the 512 live broadcast rooms to be detected in 10 time periods are input into a neural network, after missing value filling, the 132-dimensional initial live broadcast feature vectors corresponding to the 512 live broadcast rooms to be detected in 10 time periods are input into an attention network in the neural network, a weight corresponding to each dimension is obtained, then the initial live broadcast feature vectors corresponding to the 512 live broadcast rooms to be detected in 10 time periods are adjusted based on the weight corresponding to each dimension, adjusted live broadcast feature vectors corresponding to the 512 live broadcast rooms to be detected in 10 time periods are obtained, then the adjusted live broadcast feature vectors corresponding to the 512 live broadcast rooms to be detected in 10 time periods can be input into a full connection layer for processing, obtaining first fusion live broadcast characteristic vectors corresponding to 512 live broadcast rooms to be detected in 10 time periods, then obtaining fusion live broadcast characteristic vectors corresponding to the 512 live broadcast rooms to be detected after passing through two layers of long-short term memory networks, and then obtaining scores representing safety detection results corresponding to the 512 live broadcast rooms to be detected after sequentially passing through a recombination layer, a full connection layer and a sigmoid function.
The security detection result obtained by the embodiment of the present disclosure may be determined by a pre-trained neural network including an attention network;
the neural network is obtained by training initial live broadcast feature vectors which comprise the corresponding initial live broadcast feature vectors of each sample live broadcast room in the plurality of sample live broadcast rooms in each time period and safety detection results which are marked in advance and correspond to each sample live broadcast room.
As shown in fig. 5, a training process of a neural network is provided, specifically, the neural network is obtained by training in the following manner, including S501 to S505:
s501, obtaining initial live broadcast feature vectors of each sample live broadcast room in each time period in a plurality of time periods, wherein the initial live broadcast feature vectors are determined based on feature extraction networks corresponding to various security feature dimensions.
Here, the initial live broadcast feature vector corresponding to each sample live broadcast room is determined, which is similar to the process of determining the initial live broadcast feature vector corresponding to the live broadcast room to be detected described above, and is not described herein again.
S502, determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room through an attention network based on the initial live broadcast feature vector of each sample live broadcast room in each time period.
Here, the manner of determining the weight of each of the multiple security feature dimensions corresponding to each sample live broadcast room is similar to the manner of determining the weight of each of the multiple security feature dimensions corresponding to the live broadcast room to be detected, which is described above, and thus, details are not repeated here.
S503, based on the weight of each safety feature dimension in the multiple safety feature dimensions corresponding to each sample live broadcast room, adjusting the initial live broadcast feature vector of the sample live broadcast room in each time period to obtain the adjusted live broadcast feature vector of the sample live broadcast room in the time period.
Here, the manner of determining the adjusted live broadcast feature vector corresponding to each time period in the sample live broadcast room is similar to the manner of determining the adjusted live broadcast feature vector corresponding to each time period in the to-be-detected live broadcast room described above, and details are not repeated here.
S504, based on the adjusted live broadcast feature vector corresponding to each time period in the sample live broadcast room, a security detection result corresponding to the sample live broadcast room is predicted.
Here, the manner of predicting the security detection result corresponding to the sample live broadcast room is similar to the manner of determining the security detection result corresponding to the to-be-detected live broadcast room described above, and details are not repeated here.
And S505, adjusting network parameter values in the neural network based on the predicted safety detection result corresponding to each sample live broadcast room and the actual safety detection result corresponding to the sample live broadcast room.
The actual safety detection result corresponding to each sample piece can be calculated in advance based on manual detection, so that a safety detection result corresponding to each sample live broadcast room obtained through prediction and an actual safety detection result marked in the sample live broadcast room can be obtained to obtain a loss function of a prediction result and an actual result, a network parameter value in the neural network is adjusted according to a loss numerical function corresponding to the loss function until the loss function value is smaller than a set threshold value or training times reach a set number, and then the neural network used for predicting the safety detection result corresponding to the live broadcast room to be detected can be obtained.
Further, when the full connection layer involved in the above is trained, the dropout layer can be combined for training, for example, when the full connection layer transmits data, the activation value of a certain neuron of the full connection layer stops working with a certain probability p, so that the overfitting phenomenon of the full connection layer can be effectively reduced, and the generalization of the full connection position is stronger.
In the embodiment of the disclosure, the initial live broadcast eigenvector of the sample live broadcast room in each time period is adjusted to obtain the adjusted live broadcast eigenvector of the sample live broadcast room in each time period, so the initial live broadcast eigenvector is adjusted, for example, the proportion of the eigenvalue in the important safety eigen dimension is increased, and the proportion of the eigenvalue in the non-important eigen dimension is weakened, so that the safety detection result corresponding to the sample live broadcast room can be obtained through more accurate prediction, then the neural network is adjusted through the predicted safety detection result and the actual safety detection result of the sample live broadcast room, and thus the neural network with higher accuracy for carrying out safety detection on the live broadcast room to be detected is obtained.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same technical concept, a safety detection device corresponding to the safety detection method is further provided in the embodiment of the present disclosure, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to that of the safety detection method in the embodiment of the present disclosure, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 6, a schematic diagram of a security detection apparatus 600 according to an embodiment of the present disclosure is shown, where the security detection apparatus 600 includes: an obtaining module 601, a first determining module 602, an adjusting module 603, and a second determining module 604.
The acquisition module 601 is configured to acquire an initial live broadcast feature vector of a to-be-detected live broadcast room in each of a plurality of time periods, the initial live broadcast feature vector being determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions;
a first determining module 602, configured to determine, based on an initial live broadcast feature vector in each of multiple time periods, a weight of each security feature dimension in multiple security feature dimensions corresponding to a to-be-detected live broadcast room through an attention network;
an adjusting module 603, configured to adjust, based on the weight of each security feature dimension, an initial live broadcast feature vector of the live broadcast room to be detected in each time period, to obtain an adjusted live broadcast feature vector corresponding to the time period of the live broadcast room to be detected;
a second determining module 604, configured to determine a security detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast feature vector corresponding to the to-be-detected live broadcast room in each time period.
In a possible implementation manner, after determining a security detection result corresponding to the to-be-detected live broadcast room, the second determining module 604 is further configured to:
and if the safety detection result indicates that the live broadcast content of the live broadcast room to be detected does not accord with the preset safety detection condition, outputting the identification corresponding to the live broadcast room to be detected and the live broadcast content corresponding to the live broadcast room to be detected.
In a possible implementation manner, the obtaining module 601, when configured to obtain an initial live broadcast feature vector of a to-be-detected live broadcast room in each of a plurality of time periods, where the initial live broadcast feature vector is determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions, includes:
acquiring live broadcast content of a to-be-detected live broadcast room in each of a plurality of continuous time periods, wherein the live broadcast content comprises scene pictures and/or audio content, and extracting live broadcast content characteristics corresponding to the live broadcast content through a network based on characteristics corresponding to various security characteristic dimensions;
and splicing the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected to obtain an initial live broadcast characteristic vector of the live broadcast room to be detected in the time period.
In a possible implementation manner, after obtaining the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period, the obtaining module 601 is further configured to:
identifying live broadcast content characteristics in an initial live broadcast characteristic vector corresponding to each time period of a to-be-detected live broadcast room, and detecting whether missing values aiming at least one safety characteristic dimension exist in the live broadcast content characteristics;
and when determining that the missing value aiming at least one safety characteristic dimension exists, filling the missing value of the at least one safety characteristic dimension based on the characteristic values of other safety characteristic dimensions in the initial live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected.
In a possible implementation manner, the first determining module 602, when configured to determine, through an attention network, a weight of each security feature dimension of multiple security feature dimensions corresponding to a to-be-detected live broadcast room based on an initial live broadcast feature vector in each of multiple time periods, includes:
determining a target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension based on an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods of the to-be-detected live broadcast room;
and inputting the target characteristic value of the to-be-detected live broadcast room under each security characteristic dimension into a full connection layer and an activation function layer in the attention network to obtain the weight of each security characteristic dimension in the multiple security characteristic dimensions corresponding to the to-be-detected live broadcast room.
In a possible implementation manner, when the first determining module 602 is configured to obtain, based on an initial live broadcast feature vector corresponding to each time period in a plurality of time periods of a to-be-detected live broadcast room, a target feature value of the to-be-detected live broadcast room in each security feature dimension, the method includes:
based on a pooling layer in the attention network, extracting a target characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods in a live broadcast room to be detected.
In a possible implementation, the first determining module 602, when configured to extract, from an initial live broadcast feature vector corresponding to each of a plurality of time periods of a live broadcast room to be detected based on a pooling layer in an attention network, a target feature value in each security feature dimension, includes:
based on a pooling layer in the attention network, extracting a maximum characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in a plurality of time periods in a live broadcast room to be detected as a target characteristic value under the security characteristic dimension.
In a possible implementation manner, the second determining module 604, when configured to determine, based on the adjusted live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room, a security detection result corresponding to the to-be-detected live broadcast room, includes:
aiming at the adjusted live broadcast characteristic vector corresponding to each time period of the live broadcast room to be detected, fusing characteristic values under different security characteristic dimensions contained in the adjusted live broadcast characteristic vector corresponding to the time period to obtain a first fused live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected;
determining a fusion live broadcast feature vector corresponding to the live broadcast room to be detected based on a first fusion live broadcast feature vector corresponding to each time period of the live broadcast room to be detected;
and determining a safety detection result corresponding to the live broadcast room to be detected based on the fusion live broadcast feature vector.
In a possible implementation manner, the second determining module 604, when configured to determine the fused live broadcast feature vector corresponding to the to-be-detected live broadcast room based on the first fused live broadcast feature vector corresponding to each time period in the to-be-detected live broadcast room, includes:
starting from a non-first time period in a plurality of time periods, fusing a first fused live broadcast feature vector corresponding to a current time period and a memory live broadcast feature vector corresponding to a previous time period of the current time period to obtain a second fused live broadcast feature vector corresponding to a current time period of a to-be-detected live broadcast room;
extracting a memory live broadcast feature vector corresponding to the current time period based on a second fusion live broadcast feature vector corresponding to the current time period, and fusing the memory live broadcast feature vector corresponding to the current time period and a first fusion live broadcast feature vector corresponding to the next time period of the current time period to obtain a second fusion live broadcast feature vector corresponding to the next time period of the to-be-detected live broadcast room;
and judging whether the next time period is the last time period in the multiple time periods, if so, taking the second fusion live broadcast feature vector corresponding to the next time period as the fusion live broadcast feature vector corresponding to the live broadcast room to be detected, and if not, taking the next time period as the current time period, and executing the step of determining the second fusion live broadcast feature vector corresponding to the current time period in the live broadcast room to be detected.
In a possible implementation, the security detection apparatus further includes a network training module 605, where the network training module is configured to train a neural network that determines the security detection result, and the neural network includes an attention network;
the neural network is obtained by training initial live broadcast feature vectors corresponding to each sample live broadcast room in a plurality of time periods and safety detection results corresponding to each pre-labeled sample live broadcast room.
In one possible implementation, the network training module 605 is configured to train the neural network in the following manner:
acquiring initial live broadcast feature vectors of each sample live broadcast room in each time period in a plurality of time periods, which are determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions;
determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room through an attention network on the basis of the initial live broadcast feature vector of each sample live broadcast room in each time period in multiple time periods;
adjusting the initial live broadcast feature vector of each sample live broadcast room in each time period based on the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room to obtain an adjusted live broadcast feature vector of the sample live broadcast room in the time period;
predicting a safety detection result corresponding to the sample live broadcast room based on the adjusted live broadcast feature vector corresponding to each time period of the sample live broadcast room;
and adjusting network parameter values in the neural network based on the predicted safety detection result corresponding to each sample live broadcast room and the actual safety detection result corresponding to the sample live broadcast room.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Corresponding to the security detection method in fig. 1, an embodiment of the present disclosure further provides an electronic device 700, as shown in fig. 7, which is a schematic structural diagram of the electronic device 700 provided in the embodiment of the present disclosure, and includes:
a processor 71, a memory 72, and a bus 73; the memory 72 is used for storing execution instructions and includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 71 and the data exchanged with the external memory 722 such as a hard disk, the processor 71 exchanges data with the external memory 722 through the memory 721, and when the electronic device 700 is operated, the processor 71 communicates with the memory 72 through the bus 73, so that the processor 71 executes the following instructions: acquiring initial live broadcast feature vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions; determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to a to-be-detected live broadcast room through an attention network on the basis of the initial live broadcast feature vector in each time period; based on the weight of each safety feature dimension, adjusting the initial live broadcast feature vector of the live broadcast room to be detected in each time period to obtain an adjusted live broadcast feature vector of the live broadcast room to be detected in the time period; and determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room.
The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the security detection method described in the above method embodiments are performed. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The computer program product of the security detection method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the security detection method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.
The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (12)

1. A security detection method, comprising:
acquiring initial live broadcast feature vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on feature extraction networks respectively corresponding to a plurality of security feature dimensions;
determining the weight of each safety characteristic dimension in multiple safety characteristic dimensions corresponding to the to-be-detected live broadcast room through an attention network on the basis of the initial live broadcast characteristic vector in each time period;
based on the weight of each safety feature dimension, adjusting the initial live broadcast feature vector of the live broadcast room to be detected in each time period to obtain an adjusted live broadcast feature vector corresponding to the time period of the live broadcast room to be detected;
determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room;
the acquiring of the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period in the multiple time periods, which is determined based on the feature extraction networks respectively corresponding to the multiple security feature dimensions, includes:
acquiring live broadcast content of the live broadcast room to be detected in each of a plurality of continuous time periods, wherein the live broadcast content comprises scene pictures and/or audio content, and extracting live broadcast content characteristics corresponding to the live broadcast content through a network based on the characteristics corresponding to the various security characteristic dimensions;
splicing the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected to obtain an initial live broadcast characteristic vector of the live broadcast room to be detected in the time period;
after the initial live broadcast feature vector of the to-be-detected live broadcast room in each time period is obtained, the safety detection method further comprises the following steps of:
identifying live broadcast content characteristics in an initial live broadcast characteristic vector corresponding to each time period of the live broadcast room to be detected, and detecting whether missing values aiming at least one safety characteristic dimension exist in the live broadcast content characteristics;
and when determining that the missing value aiming at least one safety characteristic dimension exists, filling the missing value of the at least one safety characteristic dimension based on the characteristic values of other safety characteristic dimensions in the initial live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected.
2. The safety detection method according to claim 1, wherein after determining the safety detection result corresponding to the to-be-detected live broadcast room, the safety detection method further comprises:
if the safety detection result indicates that the live broadcast content of the live broadcast room to be detected does not accord with a preset safety detection condition, outputting the identification corresponding to the live broadcast room to be detected and the live broadcast content corresponding to the live broadcast room to be detected to a client corresponding to a worker, wherein the preset safety detection condition comprises that the risk score is lower than a risk score threshold value and/or the risk grade is lower than a high risk grade.
3. The safety detection method according to claim 1, wherein the determining, through an attention network, the weight of each of a plurality of safety feature dimensions corresponding to the to-be-detected live broadcast room based on the initial live broadcast feature vector in each of the plurality of time periods includes:
determining a target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension based on the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods;
and inputting the target characteristic value of the to-be-detected live broadcast room under each safety characteristic dimension into a full connection layer and an activation function layer in the attention network to obtain the weight of each safety characteristic dimension in the multiple safety characteristic dimensions corresponding to the to-be-detected live broadcast room.
4. The safety detection method according to claim 3, wherein obtaining the target characteristic value of the to-be-detected live broadcast room in each safety characteristic dimension based on the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods comprises:
and extracting a target characteristic value under each security characteristic dimension from the initial live broadcast characteristic vector corresponding to each time period in the multiple time periods of the live broadcast room to be detected based on the pooling layer in the attention network.
5. The safety detection method according to claim 4, wherein the extracting, based on a pooling layer in the attention network, a target feature value in each safety feature dimension from an initial live broadcast feature vector corresponding to each time period in the multiple time periods in the live broadcast room to be detected comprises:
based on a pooling layer in the attention network, extracting a maximum characteristic value under each security characteristic dimension from an initial live broadcast characteristic vector corresponding to each time period in the multiple time periods in the live broadcast room to be detected as the target characteristic value under the security characteristic dimension.
6. The safety detection method according to claim 1, wherein the determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast feature vector corresponding to the to-be-detected live broadcast room in each time period includes:
for the adjusted live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room, fusing feature values under different security feature dimensions contained in the adjusted live broadcast feature vector corresponding to the time period to obtain a first fused live broadcast feature vector corresponding to the time period of the to-be-detected live broadcast room;
determining a fusion live broadcast feature vector corresponding to the to-be-detected live broadcast room based on a first fusion live broadcast feature vector corresponding to each time period of the to-be-detected live broadcast room;
and determining a safety detection result corresponding to the to-be-detected live broadcast room based on the fusion live broadcast feature vector.
7. The safety detection method according to claim 6, wherein the determining the fused live broadcast feature vector corresponding to the live broadcast room to be detected based on the first fused live broadcast feature vector corresponding to each time period in the live broadcast room to be detected comprises:
from a non-first time period in the time periods, fusing a first fused live broadcast feature vector corresponding to a current time period and a memory live broadcast feature vector corresponding to a previous time period of the current time period to obtain a second fused live broadcast feature vector corresponding to the current time period of the to-be-detected live broadcast room;
extracting a memory live broadcast feature vector corresponding to the current time period based on a second fusion live broadcast feature vector corresponding to the current time period, and fusing the memory live broadcast feature vector corresponding to the current time period and a first fusion live broadcast feature vector corresponding to the next time period of the current time period to obtain a second fusion live broadcast feature vector corresponding to the next time period of the to-be-detected live broadcast room;
and judging whether the next time period is the last time period in the time periods, if so, taking a second fusion live broadcast feature vector corresponding to the next time period as a fusion live broadcast feature vector corresponding to the live broadcast room to be detected, and if not, taking the next time period as the current time period, and executing the step of determining the second fusion live broadcast feature vector corresponding to the current time period in the live broadcast room to be detected.
8. The safety detection method according to any one of claims 1 to 7, wherein the safety detection result is realized by a pre-trained neural network including an attention network;
the neural network is obtained by training initial live broadcast feature vectors corresponding to each sample live broadcast room in a plurality of time periods and safety detection results corresponding to each pre-labeled sample live broadcast room.
9. The security detection method of claim 8, wherein the neural network is trained in the following manner:
acquiring initial live broadcast feature vectors of each sample live broadcast room in each time period in a plurality of time periods, which are determined based on the feature extraction networks respectively corresponding to the multiple security feature dimensions;
determining the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room through the attention network based on the initial live broadcast feature vector of each sample live broadcast room in each time period in multiple time periods;
adjusting the initial live broadcast feature vector of each sample live broadcast room in each time period based on the weight of each safety feature dimension in multiple safety feature dimensions corresponding to each sample live broadcast room to obtain an adjusted live broadcast feature vector of the sample live broadcast room in the time period;
predicting a safety detection result corresponding to the sample live broadcast room based on the adjusted live broadcast feature vector corresponding to each time period of the sample live broadcast room;
and adjusting the network parameter values in the neural network based on the predicted safety detection result corresponding to each sample live broadcast room and the actual safety detection result corresponding to the sample live broadcast room.
10. A security detection device, comprising:
the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring initial live broadcast characteristic vectors of a to-be-detected live broadcast room in each time period in a plurality of time periods, which are determined based on characteristic extraction networks respectively corresponding to a plurality of security characteristic dimensions;
a first determining module, configured to determine, based on an initial live broadcast feature vector in each of the multiple time periods, a weight of each of multiple security feature dimensions corresponding to the to-be-detected live broadcast room through an attention network;
the adjusting module is used for adjusting the initial live broadcast characteristic vector of the live broadcast room to be detected in each time period based on the weight of each safety characteristic dimension to obtain an adjusted live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected;
the second determining module is used for determining a safety detection result corresponding to the to-be-detected live broadcast room based on the adjusted live broadcast characteristic vector corresponding to each time period of the to-be-detected live broadcast room;
the method comprises the following steps that when the acquisition module is used for acquiring initial live broadcast feature vectors of a to-be-detected live broadcast room determined based on feature extraction networks respectively corresponding to multiple security feature dimensions in each time period of multiple time periods, the acquisition module comprises the following steps:
acquiring live broadcast content of the live broadcast room to be detected in each of a plurality of continuous time periods, wherein the live broadcast content comprises scene pictures and/or audio content, and extracting live broadcast content characteristics corresponding to the live broadcast content through a network based on the characteristics corresponding to the various security characteristic dimensions; splicing the live broadcast content characteristics corresponding to each time period of the live broadcast room to be detected, and the historical behavior characteristics and the user attribute characteristics corresponding to the time period of the live broadcast room to be detected to obtain an initial live broadcast characteristic vector of the live broadcast room to be detected in the time period;
after the initial live broadcast feature vector of the live broadcast room to be detected in each time period is obtained, the obtaining module is further configured to:
identifying live broadcast content characteristics in an initial live broadcast characteristic vector corresponding to each time period of the live broadcast room to be detected, and detecting whether missing values aiming at least one safety characteristic dimension exist in the live broadcast content characteristics; and when determining that the missing value aiming at least one safety characteristic dimension exists, filling the missing value of the at least one safety characteristic dimension based on the characteristic values of other safety characteristic dimensions in the initial live broadcast characteristic vector corresponding to the time period of the live broadcast room to be detected.
11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the security detection method according to any one of claims 1 to 9.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the security detection method according to one of claims 1 to 9.
CN202010589220.8A 2020-06-24 2020-06-24 Security detection method and device, electronic equipment and storage medium Active CN111770352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010589220.8A CN111770352B (en) 2020-06-24 2020-06-24 Security detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010589220.8A CN111770352B (en) 2020-06-24 2020-06-24 Security detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111770352A CN111770352A (en) 2020-10-13
CN111770352B true CN111770352B (en) 2021-12-07

Family

ID=72721658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010589220.8A Active CN111770352B (en) 2020-06-24 2020-06-24 Security detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111770352B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712066B (en) * 2021-01-19 2023-02-28 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and storage medium
CN113766256A (en) * 2021-02-09 2021-12-07 北京沃东天骏信息技术有限公司 Live broadcast wind control method and device

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070010998A1 (en) * 2005-07-08 2007-01-11 Regunathan Radhakrishnan Dynamic generative process modeling, tracking and analyzing
KR101106677B1 (en) * 2009-05-20 2012-01-18 (주)위디랩 Method of managing contents and contents operation and management system using the same
CN106250837B (en) * 2016-07-27 2019-06-18 腾讯科技(深圳)有限公司 A kind of recognition methods of video, device and system
CN107197331B (en) * 2017-05-03 2020-01-31 北京奇艺世纪科技有限公司 method and device for monitoring live broadcast content in real time
CN107682719A (en) * 2017-09-05 2018-02-09 广州数沃信息科技有限公司 A kind of monitoring and assessing method and device of live content health degree
CN109145828B (en) * 2018-08-24 2020-12-25 北京字节跳动网络技术有限公司 Method and apparatus for generating video category detection model
CN110969066B (en) * 2018-09-30 2023-10-10 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN109495766A (en) * 2018-11-27 2019-03-19 广州市百果园信息技术有限公司 A kind of method, apparatus, equipment and the storage medium of video audit
CN109711459B (en) * 2018-12-24 2019-11-15 广东德诚科教有限公司 User individual action estimation method, apparatus, computer equipment and storage medium
CN109803152B (en) * 2018-12-28 2021-05-21 广州华多网络科技有限公司 Violation auditing method and device, electronic equipment and storage medium
CN110796098B (en) * 2019-10-31 2021-07-27 广州市网星信息技术有限公司 Method, device, equipment and storage medium for training and auditing content auditing model
CN111143612B (en) * 2019-12-27 2023-06-27 广州市百果园信息技术有限公司 Video auditing model training method, video auditing method and related devices

Also Published As

Publication number Publication date
CN111770352A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN108491817B (en) Event detection model training method and device and event detection method
CN111770353A (en) Live broadcast monitoring method and device, electronic equipment and storage medium
CN113657465A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN108334910B (en) Event detection model training method and event detection method
CN108229262B (en) Pornographic video detection method and device
CN112700252B (en) Information security detection method and device, electronic equipment and storage medium
CN111770352B (en) Security detection method and device, electronic equipment and storage medium
CN112107866A (en) User behavior data processing method, device, equipment and storage medium
CN111836063A (en) Live broadcast content monitoring method and device
CN110414335A (en) Video frequency identifying method, device and computer readable storage medium
CN112402982A (en) User cheating behavior detection method and system based on machine learning
CN114528474A (en) Method and device for determining recommended object, electronic equipment and storage medium
CN108108299B (en) User interface testing method and device
CN111652073B (en) Video classification method, device, system, server and storage medium
CN113128526A (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN112966547A (en) Neural network-based gas field abnormal behavior recognition early warning method, system, terminal and storage medium
US11068745B2 (en) Disruption of face detection
CN116563604A (en) End-to-end target detection model training, image target detection method and related equipment
CN115718830A (en) Method for training information extraction model, information extraction method and corresponding device
CN113254788B (en) Big data based recommendation method and system and readable storage medium
CN110807179B (en) User identification method, device, server and storage medium
CN114186039A (en) Visual question answering method and device and electronic equipment
CN113573009A (en) Video processing method, video processing device, computer equipment and storage medium
CN114265757A (en) Equipment anomaly detection method and device, storage medium and equipment
CN113518201B (en) Video processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230713

Address after: 100190 1309, 13th floor, building 4, Zijin Digital Park, Haidian District, Beijing

Patentee after: Beijing volcano Engine Technology Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Douyin Vision Co.,Ltd.