WO2020000879A1 - 图像识别方法和装置 - Google Patents

图像识别方法和装置 Download PDF

Info

Publication number
WO2020000879A1
WO2020000879A1 PCT/CN2018/116335 CN2018116335W WO2020000879A1 WO 2020000879 A1 WO2020000879 A1 WO 2020000879A1 CN 2018116335 W CN2018116335 W CN 2018116335W WO 2020000879 A1 WO2020000879 A1 WO 2020000879A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
identified
screenshot
recognition
recognition result
Prior art date
Application number
PCT/CN2018/116335
Other languages
English (en)
French (fr)
Inventor
周恺卉
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020000879A1 publication Critical patent/WO2020000879A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to an image recognition method and device.
  • the embodiments of the present application provide an image recognition method and device.
  • an embodiment of the present application provides an image recognition method.
  • the method includes: acquiring an image to be identified; inputting the image to be identified into a pre-trained screen image recognition model to obtain a characterization for whether the image to be identified is a screen image Of the recognition result, wherein the screen capture image recognition model is used to characterize the correspondence between the image to be recognized and the recognition result; and in response to the recognition result indicating that the image to be recognized is a screenshot image, the image to be recognized is deleted.
  • the method before acquiring the image to be identified, includes: acquiring a target image; and capturing a preset region of the target image as the image to be identified.
  • the method before acquiring the image to be identified, includes: acquiring a frame sequence of the target video; and selecting a target frame in the frame sequence of the target video as the image to be identified.
  • the screenshot image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes a sample image and label information used to characterize whether the sample image is a screenshot image; The sample image is used as input, and the label information corresponding to the input sample image is used as the desired output, and a screenshot image recognition model is trained.
  • the method further includes: in response to the recognition result indicating that the image to be recognized is not a screenshot image, performing text recognition on the image to be recognized to obtain the recognition result; determining whether the recognition result includes a preset text; and in response to determining the recognition result Contains preset text to delete images to be identified.
  • an embodiment of the present application provides an image recognition device, the device includes: an image to be identified acquisition unit configured to obtain the image to be identified; an identification unit configured to input the image to be identified into a pre-trained screenshot An image recognition model to obtain a recognition result used to characterize whether the image to be recognized is a screenshot image, wherein the screenshot image recognition model is used to represent the correspondence between the image to be recognized and the recognition result; a first deletion unit configured to respond to the recognition result
  • the image to be identified is a screenshot image, and the image to be identified is deleted.
  • the apparatus further includes: a pushing unit configured to indicate that the image to be identified is not a screenshot image in response to the recognition result, and to push information for indicating that the image to be identified is not a screenshot image.
  • the apparatus further includes: a target image acquisition unit configured to acquire a target image; and a capture unit configured to intercept a preset region of the target image as an image to be identified.
  • the apparatus further includes: a frame sequence acquisition unit configured to acquire a frame sequence of the target video; and a selection unit configured to select a target frame in the frame sequence of the target video as an image to be identified.
  • the screenshot image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes a sample image and label information used to characterize whether the sample image is a screenshot image; The sample image is used as input, and the label information corresponding to the input sample image is used as the desired output, and a screenshot image recognition model is trained.
  • the apparatus further includes: a recognition unit configured to indicate that the image to be recognized is not a screenshot image in response to the recognition result, to perform text recognition on the image to be recognized to obtain a recognition result; and a determination unit configured to determine the recognition result Whether to include a preset text; and a second deleting unit configured to delete the image to be recognized in response to determining that the recognition result includes the preset text.
  • a recognition unit configured to indicate that the image to be recognized is not a screenshot image in response to the recognition result, to perform text recognition on the image to be recognized to obtain a recognition result
  • a determination unit configured to determine the recognition result Whether to include a preset text
  • a second deleting unit configured to delete the image to be recognized in response to determining that the recognition result includes the preset text.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes: one or more processors; a storage device that stores one or more programs thereon; Or multiple processors execute, so that the above one or more processors implement the method as described in any implementation manner of the first aspect.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored.
  • a computer program is stored.
  • the image recognition method and device provided in the embodiments of the present application recognize a to-be-recognized image by taking a screenshot image recognition model. If the image to be identified is a screenshot, delete it. Thus, the recognition of the image to be recognized and the deletion of the screenshot image are realized. Among them, because the screen capture image recognition model is used, compared with manual review, the efficiency of image review and recognition is improved.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of an image recognition method according to the present application.
  • FIG. 3 is a schematic diagram of an application scenario of the image recognition method according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an image recognition apparatus according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a server according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which an image recognition method or an image recognition apparatus of an embodiment of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as photographing applications, picture processing applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices with support for storing and transmitting images, including, but not limited to, smart phones, tablet computers, laptop computers, and desktop computers.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the server 105 may be a server that provides various services, such as a background server that processes images stored in the terminal devices 101, 102, and 103.
  • the background server may process the received image (for example, identify whether it is a screenshot image), and perform corresponding processing according to the processing result (for example, the recognition result).
  • the image recognition method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the image recognition device is generally provided in the server 105.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster consisting of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • the image recognition method includes the following steps:
  • Step 201 Obtain an image to be identified.
  • an execution subject of the image recognition method may obtain an image to be recognized from a terminal in a wired connection manner or a wireless connection manner.
  • the image to be identified may also be stored locally on the execution subject. At this time, the execution subject may directly obtain the image to be identified from the local.
  • the image to be identified may be any image that needs to be identified. In practice, the images to be identified can be specified by a technician or filtered according to certain conditions.
  • the method may include: acquiring a frame sequence of the target video; and selecting a target frame in the frame sequence of the target video as the image to be identified.
  • the target video can be any video.
  • the determination of the target video can be specified by a technician, or it can be filtered according to certain conditions.
  • the target frame may be at least one frame in the above-mentioned frame sequence.
  • the target frame can be specified by a technician, or it can be filtered according to certain conditions.
  • the condition may be: one frame is drawn at 2 second intervals.
  • step 202 an image to be identified is input to a pre-trained screen image recognition model, and a recognition result for characterizing whether the image to be identified is a screen image is obtained.
  • the above-mentioned execution subject may input an image to be recognized into a pre-trained screen image recognition model.
  • a recognition result for characterizing whether the image to be recognized is a screenshot image
  • the screenshot image may be an image recording content displayed on a screen of the electronic device.
  • the recognition results can take many forms. For example, you can use a number to indicate whether the image to be identified is a screenshot. Specifically, “1” may be used to indicate that the image to be identified is a screenshot image. Use "0" to indicate that the image to be identified is not a screenshot.
  • the recognition result may also be a value between 0 and 1, which is used to indicate the probability that the image to be recognized is a screenshot image.
  • the recognition results can also be text, characters, and so on.
  • the form of the recognition result is not limited.
  • the screenshot image recognition model is used to characterize the correspondence between the image to be recognized and the recognition result.
  • the screenshot image model may be a correspondence table storing a large number of images (including screenshot images or non-screenshot images) and recognition results corresponding to the images.
  • the correspondence relationship table may be generated based on statistics of a large number of images and recognition results.
  • the above-mentioned execution subject can match the image to be identified with a large number of images in the correspondence table.
  • a preset threshold for example, 95%) can be determined.
  • the recognition result corresponding to the determined image may be used as the recognition result of the image to be recognized.
  • the above screenshot image recognition model may also be a neural network.
  • the neural network abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. Usually consists of a large number of nodes (or neurons) connected to each other, each node represents a specific output function, called the excitation function. The connection between each two nodes represents a weighted value for the signal passing through the connection, called a weight (also called a parameter), and the output of the network varies according to the connection mode, weight value and incentive function of the network.
  • a neural network usually includes multiple layers, and each layer includes multiple nodes. Generally, the nodes of the same layer can have the same weight, and the nodes of different layers can have different weights, so the parameters of multiple layers of the neural network can also be different.
  • Step 203 In response to the recognition result indicating that the image to be recognized is a screenshot image, the image to be recognized is deleted.
  • the execution subject may delete the image to be recognized.
  • the above-mentioned execution subject may also push information for indicating that the image to be identified is not a screenshot image.
  • text recognition is performed on the to-be-recognized image to obtain the recognition result; determining whether the recognition result includes a preset text; and responding to determining the recognition
  • the result contains preset text, and the image to be recognized is deleted.
  • the above-mentioned execution subject may perform character recognition on the image to be recognized through various methods to obtain the recognition result.
  • the recognition result may be related information of the text displayed in the image to be recognized.
  • OCR Optical Character Recognition
  • the execution body may determine whether the recognition result (for example, the obtained text) contains a preset text (for example, the name of an operator, etc.). If so, the execution subject may delete the image to be identified.
  • FIG. 3 is a schematic diagram of an application scenario of the image recognition method according to this embodiment.
  • the execution body of the image recognition method is the server 300.
  • the server 300 may first obtain an image 301 to be identified from a terminal. Then, the to-be-recognized image 301 is input to a pre-trained screen image recognition model to obtain a recognition result. If the recognition result indicates that the to-be-recognized image 301 is a screenshot image, the to-be-recognized image 301 is deleted.
  • the image recognition method provided by the above embodiments of the present application uses a screen capture image recognition model to identify an image to be recognized. If the image to be identified is a screenshot, delete it. Thus, the recognition of the image to be recognized and the deletion of the screenshot image are realized. Among them, because the screen capture image recognition model is used, compared with manual review, the efficiency of image review and recognition is improved.
  • FIG. 4 illustrates a process 400 of still another embodiment of the image recognition method.
  • the process 400 of the image recognition method includes the following steps:
  • Step 401 Obtain a target image.
  • the execution subject of the image recognition method may obtain the target image from the terminal in a wired connection or a wireless connection.
  • the target image can be any image.
  • the target image can be specified by a technician, or it can be filtered based on preset conditions.
  • the target image may be stored locally in the execution subject. At this time, the execution subject may also directly obtain the target image from the local.
  • Step 402 Capture a preset area of the target image as the image to be identified.
  • the execution subject may intercept a preset area of the target image as the image to be identified.
  • the preset area may be a part or all of the target image. For example, it can be the upper fifth area.
  • the above-mentioned execution subject may intercept the preset area of the target image in various ways. For example, through some screenshot applications or image processing applications.
  • Step 403 Acquire an image to be identified.
  • the execution subject may obtain the to-be-recognized image obtained in step 402. Because the image to be identified is obtained in step 402, it can generally be obtained directly from the local.
  • Step 404 Input the image to be identified into a pre-trained screen image recognition model, and obtain a recognition result used to characterize whether the image to be identified is a screen image.
  • the above-mentioned screenshot image recognition model may be a model obtained by training an image classification network, such as a Convolutional Neural Network (CNN), based on multiple training samples using a machine learning method.
  • CNN Convolutional Neural Network
  • the convolutional neural network can be a kind of feed-forward neural network, and its artificial neurons can respond to a part of the surrounding cells in the coverage area, and it has excellent performance for image processing.
  • a convolutional neural network may include a convolutional layer, a pooling layer, a depooling layer, and a deconvolution layer.
  • the convolution layer can be used to extract image features.
  • the pooling layer can be used to downsample the input information.
  • the depooling layer can be used to upsample the input information
  • the deconvolution layer is used to deconvolve the input information
  • the transposition of the convolution kernel of the convolution layer is used as the deconvolution layer.
  • the convolution kernel processes the input information.
  • the above screenshot image recognition model can be trained by the following steps:
  • the first step is to obtain a training sample set, where each training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image.
  • each training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image.
  • the label information may be in various forms.
  • the label information may be a numerical value. For example, "0" indicates that it is not a screenshot image, and "1" indicates that it is a screenshot image.
  • the label information may also be text, characters, and so on.
  • the sample image of the training samples in the training sample set is used as input, and the label information corresponding to the input sample image is used as the desired output, and a screenshot image recognition model is trained.
  • the sample images of the training samples can be input into the initial image classification network.
  • the initial image classification network may be various image classification networks. As an example, it may be a residual network (Residual Network, ResNet), VGG, or the like.
  • VGG is a classification model proposed by the Visual Geometry Group (VGG) of a university.
  • an initial value can be set for the initial image classification network. For example, it could be some different small random numbers. The "small random number” is used to ensure that the network does not enter a saturation state due to excessive weights, which causes training failure. "Different" is used to ensure that the network can learn normally. After that, the recognition result of the input sample image can be obtained.
  • the machine learning method is used to train the initial image classification network. Specifically, the difference between the recognition result and the label information calculated by using a preset loss function can be used first. Then, based on the obtained differences, the parameters of the initial image classification network can be adjusted, and if the preset training end condition is met, the training is ended, and the trained initial image classification network is used as a screenshot image recognition model.
  • the training end condition here includes but is not limited to at least one of the following: the training time exceeds a preset duration; the number of training times reaches a preset number of times; and the calculated difference is less than a preset difference threshold.
  • BP Back Propagation, Back Propagation
  • SGD Spochastic Gradient Descent, Stochastic Gradient Descent
  • the execution subject of the training step and the image recognition method may be the same or different. If they are the same, the execution subject can store the network structure and parameter values of the trained image recognition model locally after training to obtain the screen image recognition model. If they are different, after the training subject obtains a screen capture image recognition model from training, the network structure and parameter values of the model may be sent to the image recognition method execution subject.
  • Step 405 In response to the recognition result indicating that the image to be recognized is a screenshot image, the image to be recognized is deleted.
  • step 405 For the specific processing of step 405 and the technical effects brought by it, reference may be made to step 203 of the embodiment corresponding to FIG. 2, and details are not described herein again.
  • the process 400 of the image recognition method in this embodiment adds an image interception step, thereby reducing unnecessary interference information in the image and improving image recognition. Accuracy.
  • this application provides an embodiment of an image recognition device.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device may specifically Used in various electronic equipment.
  • the image recognition device 500 in this embodiment includes an image acquisition unit 501, an image recognition unit 502, and a first deletion unit 503.
  • the image to-be-identified unit 501 is configured to acquire an image to be identified.
  • the image recognition unit 502 is configured to input a to-be-recognized image into a pre-trained screenshot recognition model to obtain a recognition result used to characterize whether the to-be-recognized image is a screenshot image, where the screenshot-recognition model is used to represent the Correspondence of recognition results.
  • the first deleting unit 503 is configured to delete the image to be identified in response to the recognition result indicating that the image to be identified is a screenshot image.
  • the apparatus 500 may further include: a push unit (not shown in the figure).
  • the pushing unit is configured to indicate that the image to be identified is not a screenshot image in response to the recognition result, and to push information for indicating that the image to be identified is not a screenshot image.
  • the apparatus 500 may further include: a target image acquisition unit (not shown in the figure) and a capture unit (not shown in the figure).
  • the target image acquisition unit is configured to acquire a target image.
  • the capturing unit is configured to capture a preset area of the target image as an image to be identified.
  • the apparatus 500 further includes: a frame sequence acquisition unit and a selection unit.
  • the frame sequence obtaining unit is configured to obtain a frame sequence of a target video.
  • the selection unit is configured to select a target frame in a frame sequence of the target video as an image to be identified.
  • the screenshot image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image; The sample images of the training samples in the training sample set are used as input, and the label information corresponding to the input sample images is used as the desired output, and a screenshot image recognition model is trained.
  • the apparatus 500 may further include: an identifying unit (not shown in the figure), a determining unit (not shown in the figure), and a second deleting unit (not shown in the figure) .
  • the recognition unit is configured to respond to the recognition result to indicate that the image to be recognized is not a screenshot image, and perform text recognition on the image to be recognized to obtain the recognition result;
  • the determination unit is configured to determine whether the recognition result includes a preset text;
  • the second deletion unit Configured to delete the image to be recognized in response to determining that the recognition result includes a preset text.
  • the above-mentioned image recognition unit 502 inputs the to-be-recognized image obtained by the to-be-recognized image acquisition unit 501 into a pre-trained screen image recognition model, and recognizes the to-be-recognized image. If the image to be identified is a screenshot image, it is deleted by the first deleting unit 503. Thus, the recognition of the image to be recognized and the deletion of the screenshot image are realized. Among them, because the screen capture image recognition model is used, compared with manual review, the efficiency of image review and recognition is improved.
  • FIG. 6 shows a schematic structural diagram of a computer system 600 suitable for implementing a server according to an embodiment of the present application.
  • the server shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 from a program stored in a read-only memory (ROM) 602 or from a storage portion 608 Instead, perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read-only memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk and the like; a communication section 609 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication section 609, and / or installed from a removable medium 611.
  • CPU central processing unit
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of this application may be written in one or more programming languages, or a combination thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language—such as "C" or a similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider) Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider Internet service provider
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an image acquisition unit to be identified, an image recognition unit, and an image first deletion unit.
  • a processor includes an image acquisition unit to be identified, an image recognition unit, and an image first deletion unit.
  • the names of these units do not constitute a limitation on the unit itself in some cases.
  • the image acquisition unit to be identified may also be described as a “unit to acquire an image to be identified”.
  • the present application also provides a computer-readable medium, which may be included in the server described in the above embodiments; or may exist alone without being assembled into the server.
  • the computer readable medium carries one or more programs, and when the one or more programs are executed by the server, the server: obtains an image to be identified; enters the image to be identified into a pre-trained screen image recognition model to obtain It is used to characterize the recognition result of whether the image to be recognized is a screenshot image, wherein the screenshot image recognition model is used to characterize the correspondence between the image to be recognized and the recognition result; in response to the recognition result, the image to be recognized is a screenshot image, and the image to be recognized is deleted .

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种图像识别方法和装置。该方法包括:获取待识别图像(201);将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果(202);响应于识别结果表征待识别图像是截屏图像,将待识别图像删除(203)。实现了对于待识别图像的识别和对于截屏图像的删除,由于使用了截屏图像识别模型,与人工审核相比,提高了图像的审核识别效率。

Description

图像识别方法和装置
本专利申请要求于2018年6月27日提交的、申请号为201810680031.4、申请人为北京字节跳动网络技术有限公司、发明名称为“图像识别方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,具体涉及图像识别方法和装置。
背景技术
随着互联网的快速发展,尤其是移动互联网的普及,各种内容的视频或图像层出不穷。为了对视频内容或图像内容进行监管,需要对于用户上传的图片或视频进行审核。
发明内容
本申请实施例提出了图像识别方法和装置。
第一方面,本申请实施例提供了一种图像识别方法,该方法包括:获取待识别图像;将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果,其中,截屏图像识别模型用于表征待识别图像与识别结果的对应关系;响应于识别结果表征待识别图像是截屏图像,将待识别图像删除。
在一些实施例中,响应于识别结果表征待识别图像不是截屏图像,推送用于表征待识别图像不是截屏图像的信息。
在一些实施例中,在获取待识别图像之前,包括:获取目标图像;截取目标图像的预设区域作为待识别图像。
在一些实施例中,在获取待识别图像之前,包括:获取目标视频 的帧序列;选取目标视频的帧序列中的目标帧作为待识别图像。
在一些实施例中,截屏图像识别模型通过以下步骤训练得到:获取训练样本集,其中,训练样本包括样本图像和用于表征样本图像是否为截屏图像的标注信息;将训练样本集中的训练样本的样本图像作为输入,将与输入的样本图像对应的标注信息作为期望输出,训练得到截屏图像识别模型。
在一些实施例中,该方法还包括:响应于识别结果表征待识别图像不是截屏图像,对待识别图像进行文字识别,得到识别结果;确定识别结果中是否包括预设文字;响应于确定识别结果中包含预设文字,将待识别图像删除。
第二方面,本申请实施例提供了一种图像识别装置,该装置包括:待识别图像获取单元,被配置成获取待识别图像;识别单元,被配置成将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果,其中,截屏图像识别模型用于表征待识别图像与识别结果的对应关系;第一删除单元,被配置成响应于识别结果表征待识别图像是截屏图像,将待识别图像删除。
在一些实施例中,该装置还包括:推送单元,被配置成响应于识别结果表征待识别图像不是截屏图像,推送用于表征待识别图像不是截屏图像的信息。
在一些实施例中,该装置还包括:目标图像获取单元,被配置成获取目标图像;截取单元,被配置成截取目标图像的预设区域作为待识别图像。
在一些实施例中,该装置还包括:帧序列获取单元,被配置成获取目标视频的帧序列;选取单元,被配置成选取目标视频的帧序列中的目标帧作为待识别图像。
在一些实施例中,截屏图像识别模型通过以下步骤训练得到:获取训练样本集,其中,训练样本包括样本图像和用于表征样本图像是否为截屏图像的标注信息;将训练样本集中的训练样本的样本图像作为输入,将与输入的样本图像对应的标注信息作为期望输出,训练得到截屏图像识别模型。
在一些实施例中,该装置还包括:识别单元,被配置成响应于识别结果表征待识别图像不是截屏图像,对待识别图像进行文字识别,得到识别结果;确定单元,被配置成确定识别结果中是否包括预设文字;第二删除单元,被配置成响应于确定识别结果中包含预设文字,将待识别图像删除。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当上述一个或多个程序被上述一个或多个处理器执行,使得上述一个或多个处理器实现如第一方面中任一实现方式描述的方法。
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,上述程序被处理器执行时实现如第一方面中任一实现方式描述的方法。
本申请实施例提供的图像识别方法和装置,通过截屏图像识别模型,对待识别图像进行识别。若待识别图像为截屏图像,则将其删除。从而实现了对于待识别图像的识别和对于截屏图像的删除。其中,由于使用了截屏图像识别模型,与人工审核相比,提高了图像的审核识别效率。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一个实施例可以应用于其中的示例性***架构图;
图2是根据本申请的图像识别方法的一个实施例的流程图;
图3是根据本申请的图像识别方法的一个应用场景的示意图;
图4是根据本申请的图像识别方法的又一个实施例的流程图;
图5是根据本申请的图像识别装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的服务器的计算机***的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请实施例的图像识别方法或图像识别装置的示例性***架构100。
如图1所示,***架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如拍照类应用、图片处理类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有支持存储并传输图像的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103中存储的图像进行处理的后台服务器。后台服务器可以对接收到的图像进行处理(例如识别是否为截屏图像),并根据处理结果(例如识别结果)进行相应的处理。
需要说明的是,本申请实施例所提供的图像识别方法一般由服务 器105执行,相应地,图像识别装置一般设置于服务器105中。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的图像识别方法的一个实施例的流程200。该图像识别方法,包括以下步骤:
步骤201,获取待识别图像。
在本实施例中,图像识别方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从终端获取待识别图像。此外,待识别图像也可以存储在上述执行主体本地。此时,上述执行主体可以直接从本地获取待识别图像。其中,待识别图像可以是需要进行识别的任意图像。实践中,待识别图像可以由技术人员指定,也可以根据一定的条件筛选。
在本实施例的一些可选的实现方式中,在获取待识别图像之前,该方法可以包括:获取目标视频的帧序列;选取目标视频的帧序列中的目标帧作为待识别图像。
在这些实现方式中,目标视频可以是任意视频。目标视频的确定可以由技术人员指定,也可以根据一定的条件筛选。目标帧可以是上述帧序列中的至少一帧。目标帧可以由技术人员指定,也可以根据一定的条件筛选得到。作为示例,条件可以是:间隔2秒抽取一帧。
步骤202,将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果。
在本实施例中,上述执行主体可以将待识别图像输入至预先训练的截屏图像识别模型。从而得到用于表征待识别图像是否为截屏图像的识别结果。其中,截屏图像可以是记录电子设备的屏幕所显示的内容的图像。识别结果可以有多种形式。例如,可以用数字表示待识别 图像是否为截屏图像。具体的,可以用“1”表示待识别图像是截屏图像。用“0”表示待识别图像不是截屏图像。又如,识别结果还可以是0到1之间的数值,用以表示待识别图像为截屏图像的概率。除此之外,识别结果也可以文字、字符等等。在此,对于识别结果的形式不做限定。
在本实施例中,截屏图像识别模型用于表征待识别图像与识别结果的对应关系。作为示例,截屏图像模型可以是存储有大量图像(包括截屏图像或非截屏图像)和图像对应的识别结果的对应关系表。其中,对应关系表可以基于大量图像和识别结果的统计而生成。这样,上述执行主体可以对于待识别图像与对应关系表中的大量图像进行匹配。从而可以确定对应关系表中与待识别图像的匹配度大于预设阈值(例如95%)的图像。之后,可以将确定的图像所对应的识别结果作为待识别图像的识别结果。
在本实施例中,上述截屏图像识别模型也可以是神经网络。神经网络从信息处理角度对人脑神经元网络进行抽象,建立某种简单模型,按不同的连接方式组成不同的网络。通常由大量的节点(或称神经元)之间相互联接构成,每个节点代表一种特定的输出函数,称为激励函数。每两个节点间的连接都代表一个对于通过该连接信号的加权值,称之为权重(又叫做参数),网络的输出则依网络的连接方式、权重值和激励函数的不同而不同。神经网络通常包括多个层,每个层包括多个节点,通常,同一层的节点的权重可以相同,不同层的节点的权重可以不同,故神经网络的多个层的参数也可以不同。
步骤203,响应于识别结果表征待识别图像是截屏图像,将待识别图像删除。
在本实施例中,响应于识别结果表征待识别图像是截屏图像,上述执行主体可以将待识别图像删除。
在本实施例的一些可选的实现方式中,响应于识别结果表征待识别图像不是截屏图像,上述执行主体还可以推送用于表征待识别图像不是截屏图像的信息。
在本实施例的一些可选的实现方式中,响应于识别结果表征待识 别图像不是截屏图像,对待识别图像进行文字识别,得到识别结果;确定识别结果中是否包括预设文字;响应于确定识别结果中包含预设文字,将待识别图像删除。
在这些实现方式中,响应于识别结果表征待识别图像不是截屏图像,上述执行主体可以通过各种方法对待识别图像进行文字识别,得到识别结果。其中,识别结果可以是待识别图像中显示的文本的相关信息。作为示例,可以利用OCR(Optical Character Recognition,光学字符识别)技术对待识别图像进行文字识别,从而得到待识别图像中显示的文本。之后,上述执行主体可以确定识别结果(例如得到的文本)是否包含预设文字(例如,可以是运营商的名称等等)。若包含,上述执行主体可以将待识别图像删除。
继续参见图3,图3是根据本实施例的图像识别方法的应用场景的一个示意图。在图3的应用场景中,图像识别方法的执行主体为服务器300。服务器300可以首先从终端获取待识别图像301。之后将待识别图像301输入至预先训练的截屏图像识别模型,得到识别结果。若识别结果表征待识别图像301是截屏图像,则删除待识别图像301。
本申请的上述实施例提供的图像识别方法通过截屏图像识别模型,对待识别图像进行识别。若待识别图像为截屏图像,则将其删除。从而实现了对于待识别图像的识别和对于截屏图像的删除。其中,由于使用了截屏图像识别模型,与人工审核相比,提高了图像的审核识别效率。
进一步参考图4,其示出了图像识别方法的又一个实施例的流程400。该图像识别方法的流程400,包括以下步骤:
步骤401,获取目标图像。
在本实施例中,图像识别方法的执行主体可以通过有线连接或无线连接的方式从终端获取目标图像。其中目标图像可以是任意图像。实践中,目标图像可以由技术人员指定,也可以根据预设条件筛选。此外,上述目标图像也可以存储于执行主体本地。此时,上述执行主体也可以从本地直接获取目标图像。
步骤402,截取目标图像的预设区域作为待识别图像。
在本实施例中,上述执行主体可以截取目标图像的预设区域作为待识别图像。其中,预设区域可以是目标图像的部分或全部区域。例如,可以是上五分之一的区域。实践中,上述执行主体可以采取多种方式截取目标图像的预设区域。例如,通过一些截图类应用或者图片处理类应用等等。
步骤403,获取待识别图像。
在本实施例中,上述执行主体可以获取步骤402中截取得到的待识别图像。由于待识别图像为步骤402中截取得到的,因而一般可以从本地直接获取。
步骤404,将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果。
在本实施例中,上述截屏图像识别模型可以是利用机器学习方法,基于多个训练样本对于图像分类网络,例如卷积神经网络(Convolutional Neural Network,CNN),进行训练后得到的模型。其中,卷积神经网络可以是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于图像处理有出色表现。卷积神经网络可以包括卷积层、池化层、反池化层和反卷积层。其中,卷积层可以用于提取图像特征。池化层可以用于对输入的信息进行降采样(downsample)。反池化层可以用于对输入的信息进行上采样(upsample),反卷积层用于对输入的信息进行反卷积,将卷积层的卷积核的转置作为反卷积层的卷积核对所输入的信息进行处理。
作为示例,上述截屏图像识别模型可以通过以下步骤训练得到:
第一步,获取训练样本集,其中,每个训练样本包括样本图像和用于表征样本图像是否为截屏图像的标注信息。实践中,可以人工对样本图像是否为截屏图像进行标注,从而得到每个样本图像的标注信息。这里,标注信息可以是各种形式。作为示例,标注信息可以是数值。例如,用“0”表示不是截屏图像,用“1”表示是截屏图像。作为示例,标注信息还可以是文字、字符等等。
第二步,将训练样本集中的训练样本的样本图像作为输入,将与 输入的样本图像对应的标注信息作为期望输出,训练得到截屏图像识别模型。
具体来说,可以将训练样本的样本图像输入初始图像分类网络。其中,初始图像分类网络可以是各种图像分类网络。作为示例,可以是残差网络(Deep Residual Network,ResNet)、VGG等等。VGG是某大学的视觉几何小组(Visual Geometry Group,VGG)提出的分类模型。实践中,可以为初始图像分类网络设置初始值。例如,可以是一些不同的小随机数。“小随机数”用来保证网络不会因权值过大而进入饱和状态,从而导致训练失败,“不同”用来保证网络可以正常地学习。之后,可以得到输入的样本图像的识别结果。以与输入的样本图像对应的标注信息作为初始图像分类网络的期望输出,利用机器学习方法训练初始图像分类网络。具体来说可以首先利用预设的损失函数计算得到的识别结果与标注信息之间的差异。然后,可以基于所得到的差异,调整初始图像分类网络的参数,并在满足预设的训练结束条件的情况下,结束训练,并将训练后的初始图像分类网络作为截屏图像识别模型。这里的训练结束条件包括但不限于以下至少一项:训练时间超过预设时长;训练次数达到预设次数;计算所得的差异小于预设差异阈值。
这里可以采用各种方式基于所得到的识别结果与输入的训练样本对应的标注信息之间的差异,调整初始图像分类网络的参数。例如,可以采用BP(Back Propagation,反向传播)算法或者SGD(Stochastic Gradient Descent,随机梯度下降)算法来调整初始图像分类网络的参数。
需要说明的是,训练步骤的执行主体与图像识别方法的执行主体可以相同,也可以不同。若相同,执行主体可以在训练得到截屏图像识别模型后,将训练后的图像识别模型的网络结构和参数值存储于本地。若不同,训练步骤的执行主体在训练得到截屏图像识别模型后,可以将模型的网络结构和参数值发送至图像识别方法的执行主体。
步骤405,响应于识别结果表征待识别图像是截屏图像,将待识别图像删除。
步骤405的具体处理及其所带来的技术效果可以参考图2对应的实施例的步骤203,在此不再赘述。
从图4中可以看出,与图2对应的实施例相比,本实施例中的图像识别方法的流程400增加了对于图像的截取步骤,从而减少图像中不必要的干扰信息,提高图像识别准确率。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种图像识别装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的图像识别装置500包括:待识别图像获取单元501、图像识别单元502和第一删除单元503。其中,待识别图像获取单元501被配置成获取待识别图像。图像识别单元502,被配置成将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果,其中,截屏图像识别模型用于表征待识别图像与识别结果的对应关系。第一删除单元503被配置成响应于识别结果表征待识别图像是截屏图像,将待识别图像删除。
在本实施例中,图像识别装置500中的待识别图像获取单元501、图像识别单元502和第一删除单元503的具体处理及其所带来的技术效果可分别参考图2对应实施例中步骤201-203的相关说明,在此不再赘述。
在本实施例的一些可选的实现方式中,装置500还可以包括:推送单元(图中未示出)。推送单元被配置成响应于识别结果表征待识别图像不是截屏图像,推送用于表征待识别图像不是截屏图像的信息。
在本实施例的一些可选的实现方式中,装置500还可以包括:目标图像获取单元(图中未示出)和截取单元(图中未示出)。其中,目标图像获取单元被配置成获取目标图像。截取单元,被配置成截取目标图像的预设区域作为待识别图像。
在本实施例的一些可选的实现方式中,装置500还包括:帧序列获取单元和选取单元。其中,帧序列获取单元被配置成获取目标视频的帧序列。选取单元被配置成选取目标视频的帧序列中的目标帧作为 待识别图像。
在本实施例的一些可选的实现方式中,截屏图像识别模型通过以下步骤训练得到:获取训练样本集,其中,训练样本包括样本图像和用于表征样本图像是否为截屏图像的标注信息;将训练样本集中的训练样本的样本图像作为输入,将与输入的样本图像对应的标注信息作为期望输出,训练得到截屏图像识别模型。
在本实施例的一些可选的实现方式中,装置500还可以包括:识别单元(图中未示出)、确定单元(图中未示出)和第二删除单元(图中未示出)。其中,识别单元被配置成响应于识别结果表征待识别图像不是截屏图像,对待识别图像进行文字识别,得到识别结果;确定单元,被配置成确定识别结果中是否包括预设文字;第二删除单元,被配置成响应于确定识别结果中包含预设文字,将待识别图像删除。
在本实施例中,上述图像识别单元502将待识别图像获取单元501获取的待识别图像输入至预先训练的截屏图像识别模型,对待识别图像进行识别。若待识别图像为截屏图像,通过第一删除单元503则将其删除。从而实现了对于待识别图像的识别和对于截屏图像的删除。其中,由于使用了截屏图像识别模型,与人工审核相比,提高了图像的审核识别效率。
下面参考图6,其示出了适于用来实现本申请实施例的服务器的计算机***600的结构示意图。图6示出的服务器仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机***600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有***600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的 输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。
需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机 可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括待识别图像获取单元、图像识别单元和图像第一删除单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,待识别图像获取单元还可以被描述为“获取待识别图像的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机 可读介质可以是上述实施例中描述的服务器中所包含的;也可以是单独存在,而未装配入该服务器中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该服务器执行时,使得该服务器:获取待识别图像;将待识别图像输入至预先训练的截屏图像识别模型,得到用于表征待识别图像是否为截屏图像的识别结果,其中,截屏图像识别模型用于表征待识别图像与识别结果的对应关系;响应于识别结果表征待识别图像是截屏图像,将待识别图像删除。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (14)

  1. 一种图像识别方法,包括:
    获取待识别图像;
    将所述待识别图像输入至预先训练的截屏图像识别模型,得到用于表征所述待识别图像是否为截屏图像的识别结果,其中,所述截屏图像识别模型用于表征待识别图像与识别结果的对应关系;
    响应于所述识别结果表征所述待识别图像是截屏图像,将所述待识别图像删除。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    响应于所述识别结果表征所述待识别图像不是截屏图像,推送用于表征所述待识别图像不是截屏图像的信息。
  3. 根据权利要求1所述的方法,其中,在所述获取待识别图像之前,包括:
    获取目标图像;
    截取所述目标图像的预设区域作为所述待识别图像。
  4. 根据权利要求1所述的方法,其中,在所述获取待识别图像之前,包括:
    获取目标视频的帧序列;
    选取所述目标视频的帧序列中的目标帧作为所述待识别图像。
  5. 根据权利要求1-4中任一所述的方法,其中,所述截屏图像识别模型通过以下步骤训练得到:
    获取训练样本集,其中,训练样本包括样本图像和用于表征样本图像是否为截屏图像的标注信息;
    将所述训练样本集中的训练样本的样本图像作为输入,将与输入的样本图像对应的标注信息作为期望输出,训练得到所述截屏图像识 别模型。
  6. 根据权利要求1-4中任一所述的方法,其中,所述方法还包括:
    响应于所述识别结果表征所述待识别图像不是截屏图像,对所述待识别图像进行文字识别,得到识别结果;
    确定所述识别结果中是否包括预设文字;
    响应于确定所述识别结果中包含所述预设文字,将所述待识别图像删除。
  7. 一种图像识别装置,包括:
    待识别图像获取单元,被配置成获取待识别图像;
    识别单元,被配置成将所述待识别图像输入至预先训练的截屏图像识别模型,得到用于表征所述待识别图像是否为截屏图像的识别结果,其中,所述截屏图像识别模型用于表征待识别图像与识别结果的对应关系;
    第一删除单元,被配置成响应于所述识别结果表征所述待识别图像是截屏图像,将所述待识别图像删除。
  8. 根据权利要求7所述的装置,其中,所述装置还包括:
    推送单元,被配置成响应于所述识别结果表征所述待识别图像不是截屏图像,推送用于表征所述待识别图像不是截屏图像的信息。
  9. 根据权利要求7所述的装置,其中,所述装置还包括:
    目标图像获取单元,被配置成获取目标图像;
    截取单元,被配置成截取所述目标图像的预设区域作为所述待识别图像。
  10. 根据权利要求7所述的装置,其中,所述装置还包括:
    帧序列获取单元,被配置成获取目标视频的帧序列;
    选取单元,被配置成选取所述目标视频的帧序列中的目标帧作为 所述待识别图像。
  11. 根据权利要求7-10中任一所述的装置,其中,所述截屏图像识别模型通过以下步骤训练得到:
    获取训练样本集,其中,训练样本包括样本图像和用于表征样本图像是否为截屏图像的标注信息;
    将所述训练样本集中的训练样本的样本图像作为输入,将与输入的样本图像对应的标注信息作为期望输出,训练得到所述截屏图像识别模型。
  12. 根据权利要求7-10中任一所述的装置,其中,所述装置还包括:
    识别单元,被配置成响应于所述识别结果表征所述待识别图像不是截屏图像,对所述待识别图像进行文字识别,得到识别结果;
    确定单元,被配置成确定所述识别结果中是否包括预设文字;
    第二删除单元,被配置成响应于确定所述识别结果中包含所述预设文字,将所述待识别图像删除。
  13. 一种服务器,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的方法。
  14. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-6中任一所述的方法。
PCT/CN2018/116335 2018-06-27 2018-11-20 图像识别方法和装置 WO2020000879A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810680031.4A CN109002842A (zh) 2018-06-27 2018-06-27 图像识别方法和装置
CN201810680031.4 2018-06-27

Publications (1)

Publication Number Publication Date
WO2020000879A1 true WO2020000879A1 (zh) 2020-01-02

Family

ID=64602070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116335 WO2020000879A1 (zh) 2018-06-27 2018-11-20 图像识别方法和装置

Country Status (2)

Country Link
CN (1) CN109002842A (zh)
WO (1) WO2020000879A1 (zh)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291644A (zh) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN111310693A (zh) * 2020-02-26 2020-06-19 腾讯科技(深圳)有限公司 图像中文本的智能标注方法、装置及存储介质
CN111353470A (zh) * 2020-03-13 2020-06-30 北京字节跳动网络技术有限公司 图像的处理方法、装置、可读介质和电子设备
CN111353434A (zh) * 2020-02-28 2020-06-30 北京市商汤科技开发有限公司 信息识别方法及装置、***、电子设备和存储介质
CN111597966A (zh) * 2020-05-13 2020-08-28 北京达佳互联信息技术有限公司 一种表情图像识别方法、装置及***
CN111767918A (zh) * 2020-02-21 2020-10-13 北京沃东天骏信息技术有限公司 一种图片识别方法和装置
CN111797645A (zh) * 2020-07-08 2020-10-20 北京京东振世信息技术有限公司 用于识别条形码的方法和装置
CN111815505A (zh) * 2020-07-14 2020-10-23 北京字节跳动网络技术有限公司 用于处理图像的方法、装置、设备和计算机可读介质
CN111860284A (zh) * 2020-07-15 2020-10-30 上海钧正网络科技有限公司 一种换电柜的安全管理方法、装置、介质及服务器
CN111914822A (zh) * 2020-07-23 2020-11-10 腾讯科技(深圳)有限公司 文本图像标注方法、装置、计算机可读存储介质及设备
CN111950591A (zh) * 2020-07-09 2020-11-17 中国科学院深圳先进技术研究院 模型训练方法、交互关系识别方法、装置及电子设备
CN112287757A (zh) * 2020-09-25 2021-01-29 北京百度网讯科技有限公司 水体识别方法、装置、电子设备及存储介质
CN112541543A (zh) * 2020-12-11 2021-03-23 深圳市优必选科技股份有限公司 图像识别方法、装置、终端设备及存储介质
CN112905843A (zh) * 2021-03-17 2021-06-04 北京文香信息技术有限公司 一种基于视频流的信息处理方法、装置以及存储介质
CN112989986A (zh) * 2021-03-09 2021-06-18 北京京东乾石科技有限公司 用于识别人群行为的方法、装置、设备以及存储介质
CN113221920A (zh) * 2021-05-20 2021-08-06 北京百度网讯科技有限公司 图像识别方法、装置、设备、存储介质以及计算机程序产品
CN113419915A (zh) * 2021-07-21 2021-09-21 北京百度网讯科技有限公司 云终端桌面静止确定方法和装置
CN113538450A (zh) * 2020-04-21 2021-10-22 百度在线网络技术(北京)有限公司 用于生成图像的方法及装置
CN113643136A (zh) * 2021-09-01 2021-11-12 京东科技信息技术有限公司 信息处理方法、***和装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368902A (zh) * 2020-02-28 2020-07-03 北京三快在线科技有限公司 一种数据标注的方法及装置
CN113741680A (zh) * 2020-05-27 2021-12-03 北京字节跳动网络技术有限公司 信息交互方法和装置
CN113546398A (zh) * 2021-07-30 2021-10-26 重庆五诶科技有限公司 基于人工智能算法的棋牌游戏方法及***
CN113961526A (zh) * 2021-11-22 2022-01-21 北京达佳互联信息技术有限公司 截屏图片的检测方法和装置
CN114693629A (zh) * 2022-03-25 2022-07-01 北京城市网邻信息技术有限公司 图像识别方法、装置、电子设备及可读介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761204A (zh) * 2005-11-18 2006-04-19 郑州金惠计算机***工程有限公司 在互联网上堵截色情图像与不良信息的***
CN103605992A (zh) * 2013-11-28 2014-02-26 国家电网公司 一种电力内外网交互中的敏感图像识别方法
CN106446932A (zh) * 2016-08-30 2017-02-22 上海交通大学 基于机器学习与图片识别的可进化违禁图片批量处理方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598902A (zh) * 2015-01-29 2015-05-06 百度在线网络技术(北京)有限公司 一种用于识别截图的方法、装置和浏览器
CN105654057A (zh) * 2015-12-31 2016-06-08 中国建设银行股份有限公司 基于图片内容的图片审核***及图片审核方法
CN107133629B (zh) * 2016-02-29 2020-09-04 百度在线网络技术(北京)有限公司 图片分类方法、装置和移动终端
CN106599937A (zh) * 2016-12-29 2017-04-26 池州职业技术学院 一种不良图片过滤装置
CN108124191B (zh) * 2017-12-22 2019-07-12 北京百度网讯科技有限公司 一种视频审核方法、装置及服务器

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761204A (zh) * 2005-11-18 2006-04-19 郑州金惠计算机***工程有限公司 在互联网上堵截色情图像与不良信息的***
CN103605992A (zh) * 2013-11-28 2014-02-26 国家电网公司 一种电力内外网交互中的敏感图像识别方法
CN106446932A (zh) * 2016-08-30 2017-02-22 上海交通大学 基于机器学习与图片识别的可进化违禁图片批量处理方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAN ZHONGHAI: "The design and Implementation of Photo-sensitive Recognition System Based on Prohibited Gallery", CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 May 2012 (2012-05-15), pages 33 - 48, ISSN: 1674-0246 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291644B (zh) * 2020-01-20 2023-04-18 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN111291644A (zh) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN111767918A (zh) * 2020-02-21 2020-10-13 北京沃东天骏信息技术有限公司 一种图片识别方法和装置
CN111310693B (zh) * 2020-02-26 2023-08-29 腾讯科技(深圳)有限公司 图像中文本的智能标注方法、装置及存储介质
CN111310693A (zh) * 2020-02-26 2020-06-19 腾讯科技(深圳)有限公司 图像中文本的智能标注方法、装置及存储介质
CN111353434A (zh) * 2020-02-28 2020-06-30 北京市商汤科技开发有限公司 信息识别方法及装置、***、电子设备和存储介质
CN111353470A (zh) * 2020-03-13 2020-06-30 北京字节跳动网络技术有限公司 图像的处理方法、装置、可读介质和电子设备
CN111353470B (zh) * 2020-03-13 2023-08-01 北京字节跳动网络技术有限公司 图像的处理方法、装置、可读介质和电子设备
US11810333B2 (en) 2020-04-21 2023-11-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for generating image of webpage content
CN113538450A (zh) * 2020-04-21 2021-10-22 百度在线网络技术(北京)有限公司 用于生成图像的方法及装置
CN113538450B (zh) * 2020-04-21 2023-07-21 百度在线网络技术(北京)有限公司 用于生成图像的方法及装置
CN111597966B (zh) * 2020-05-13 2023-10-10 北京达佳互联信息技术有限公司 一种表情图像识别方法、装置及***
CN111597966A (zh) * 2020-05-13 2020-08-28 北京达佳互联信息技术有限公司 一种表情图像识别方法、装置及***
CN111797645A (zh) * 2020-07-08 2020-10-20 北京京东振世信息技术有限公司 用于识别条形码的方法和装置
CN111950591A (zh) * 2020-07-09 2020-11-17 中国科学院深圳先进技术研究院 模型训练方法、交互关系识别方法、装置及电子设备
CN111950591B (zh) * 2020-07-09 2023-09-01 中国科学院深圳先进技术研究院 模型训练方法、交互关系识别方法、装置及电子设备
CN111815505A (zh) * 2020-07-14 2020-10-23 北京字节跳动网络技术有限公司 用于处理图像的方法、装置、设备和计算机可读介质
CN111860284A (zh) * 2020-07-15 2020-10-30 上海钧正网络科技有限公司 一种换电柜的安全管理方法、装置、介质及服务器
CN111914822A (zh) * 2020-07-23 2020-11-10 腾讯科技(深圳)有限公司 文本图像标注方法、装置、计算机可读存储介质及设备
CN111914822B (zh) * 2020-07-23 2023-11-17 腾讯科技(深圳)有限公司 文本图像标注方法、装置、计算机可读存储介质及设备
CN112287757A (zh) * 2020-09-25 2021-01-29 北京百度网讯科技有限公司 水体识别方法、装置、电子设备及存储介质
CN112287757B (zh) * 2020-09-25 2024-04-26 北京百度网讯科技有限公司 水体识别方法、装置、电子设备及存储介质
CN112541543A (zh) * 2020-12-11 2021-03-23 深圳市优必选科技股份有限公司 图像识别方法、装置、终端设备及存储介质
CN112541543B (zh) * 2020-12-11 2023-11-24 深圳市优必选科技股份有限公司 图像识别方法、装置、终端设备及存储介质
CN112989986A (zh) * 2021-03-09 2021-06-18 北京京东乾石科技有限公司 用于识别人群行为的方法、装置、设备以及存储介质
CN112905843A (zh) * 2021-03-17 2021-06-04 北京文香信息技术有限公司 一种基于视频流的信息处理方法、装置以及存储介质
CN113221920A (zh) * 2021-05-20 2021-08-06 北京百度网讯科技有限公司 图像识别方法、装置、设备、存储介质以及计算机程序产品
CN113221920B (zh) * 2021-05-20 2024-01-12 北京百度网讯科技有限公司 图像识别方法、装置、设备、存储介质以及计算机程序产品
CN113419915A (zh) * 2021-07-21 2021-09-21 北京百度网讯科技有限公司 云终端桌面静止确定方法和装置
CN113643136A (zh) * 2021-09-01 2021-11-12 京东科技信息技术有限公司 信息处理方法、***和装置

Also Published As

Publication number Publication date
CN109002842A (zh) 2018-12-14

Similar Documents

Publication Publication Date Title
WO2020000879A1 (zh) 图像识别方法和装置
WO2019242222A1 (zh) 用于生成信息的方法和装置
WO2020006963A1 (zh) 生成图像检测模型的方法和装置
CN108427939B (zh) 模型生成方法和装置
WO2020006961A1 (zh) 用于提取图像的方法和装置
WO2020000876A1 (zh) 用于生成模型的方法和装置
US11436863B2 (en) Method and apparatus for outputting data
WO2019237657A1 (zh) 用于生成模型的方法和装置
CN109740018B (zh) 用于生成视频标签模型的方法和装置
CN109993150B (zh) 用于识别年龄的方法和装置
CN111476871B (zh) 用于生成视频的方法和装置
CN108235116B (zh) 特征传播方法和装置、电子设备和介质
JP7394809B2 (ja) ビデオを処理するための方法、装置、電子機器、媒体及びコンピュータプログラム
CN109034069B (zh) 用于生成信息的方法和装置
US11087140B2 (en) Information generating method and apparatus applied to terminal device
WO2020029608A1 (zh) 用于检测电极片毛刺的方法和装置
CN108235004B (zh) 视频播放性能测试方法、装置和***
CN109583389B (zh) 绘本识别方法及装置
CN110046571B (zh) 用于识别年龄的方法和装置
CN110070076B (zh) 用于选取训练用样本的方法和装置
CN108399401B (zh) 用于检测人脸图像的方法和装置
CN113033677A (zh) 视频分类方法、装置、电子设备和存储介质
CN110008926B (zh) 用于识别年龄的方法和装置
CN109816023B (zh) 用于生成图片标签模型的方法和装置
CN108921138B (zh) 用于生成信息的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18923991

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/05/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18923991

Country of ref document: EP

Kind code of ref document: A1