CN112802049A

CN112802049A - Method and system for constructing household article detection data set

Info

Publication number: CN112802049A
Application number: CN202110240799.1A
Authority: CN
Inventors: 田国会; 宋成; 张营; 段胜琪; 冯晨锐
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-05-14
Anticipated expiration: 2041-03-04
Also published as: CN112802049B

Abstract

The invention provides a method and a system for constructing a household article detection data set, which are characterized in that an article example picture is obtained by utilizing a network crawling or autonomous acquisition mode; acquiring a background picture of a family scene; processing the article example picture, and deleting the background in the article example picture to obtain an article example part and a binarization mask corresponding to the article example part; processing the size of the background picture to reduce the resolution of the background picture; after the binary mask of the object is subjected to scale scaling and rotation operations, a temporary mask is obtained according to a certain background picture proportion, coordinates of the mask in the background picture are generated, the example picture is synthesized into the background picture according to the position and the size of the mask, and information labeling is carried out. The data set constructed by the method is stronger in diversity and lower in labeling cost.

Description

Method and system for constructing household article detection data set

Technical Field

The invention belongs to the technical field of computer vision images, and particularly relates to a method and a system for constructing a household article detection data set.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the rapid development of new technologies such as 5G, artificial intelligence and the Internet of things, smart homes become a hot development direction in the future, the intelligent service robot serves as an important terminal, the market demand of the intelligent service robot is rapidly expanded, and the service robot technology is also paid extensive attention. When the service robot completes each work, accurate sensing capability is required as a premise.

In the past years, the use of a deep neural network becomes possible due to the rapid improvement of the parallel computing capability of a computer, the rapid development of the image processing field is promoted, and the detection precision and the detection speed of a target detection algorithm show a qualitative leap. Although the performance of the target detection algorithm is sufficiently superior, it requires a huge data set as a support to be effective due to the limitation of deep learning. Articles in a family environment are various, and a target detection algorithm based on a deep neural network needs large-scale data training to realize high-precision detection, so that the construction of a family article data set is a key and heavy work. Data sets are often manually labeled by scientific institutions hiring professional profit organizations, and achieving such a scale is difficult to achieve by only a few people.

At present, most of the existing labeling methods adopt manual picture collection and labeling by means of graphical software. For a large number of images in a data set, it is a tedious and time-consuming task to make and process the data set. Besides the traditional manual labeling method, the target position can be obtained by extracting a part in a specified gray value through a threshold segmentation method. There are methods that scan the characters on paper using optical character recognition technology, then translate them into computer text, extract the numbers from them to generate a MNIST-like handwriting data set. The other method is to construct a coal pile detection data set, obtain a first data set through labeling of different types of labeling frames, train a detection model by using the first data set, test precision by using the detection model, iteratively optimize the model, simultaneously save a detection result as labeling information, fuse the labeling information with the first data set to obtain a second data set, finish iteration until the precision of the data set training model meets requirements, and use a final data set as the coal pile detection data set. Although the method is a non-manual labeling method, the detection target is single and simple, and the difficulty of constructing the household article detection data set is that diverse article examples, different poses and scales of the articles and pictures of the articles placed in different real scenes are difficult to obtain, so that the data set containing enough abundant articles is obtained.

Disclosure of Invention

The invention provides a method and a system for constructing a household article detection data set, aiming at solving the problems, the invention utilizes a web crawler technology to obtain rich example pictures, automatically screens out irrelevant pictures through a digital image processing technology, and obtains an example part and a corresponding mask through a background elimination technology; and synthesizing a data set picture according to the mask to obtain bounding box information, generating marking information corresponding to the picture, ensuring the accuracy of the information, acquiring various objects, different poses and scales of the objects, and pictures of the objects placed in different real scenes, wherein the constructed data set is rich and accurate.

According to some embodiments, the invention adopts the following technical scheme:

a household article detection data set construction method comprises the following steps:

acquiring an article example picture by utilizing a network crawling or autonomous acquisition mode;

acquiring a background picture of a family scene;

processing the article example picture, and deleting the background in the article example picture to obtain an article example part and a binarization mask corresponding to the article example part;

processing the size of the background picture to reduce the resolution of the background picture;

after the binary mask of the object is subjected to scale scaling and rotation operations, a temporary mask is obtained according to a certain background picture proportion, coordinates of the mask in the background picture are generated, the example picture is synthesized into the background picture according to the position and the size of the mask, and information labeling is carried out.

In an alternative embodiment, the article in the example picture is in an unobstructed state, at least 1/2 of the whole picture.

As an alternative embodiment, the specific process of acquiring the picture of the article instance by using the network crawling mode comprises the following steps: the method comprises the steps of using a python crawler technology to request a picture search engine to search pictures of specific categories of articles to obtain returned webpage information, analyzing the content of the returned webpage information to obtain picture URL information, downloading the pictures and storing the pictures locally, and using a digital image processing technology to screen out complex background pictures.

As an alternative embodiment, the specific process of acquiring the image of the article instance by using the autonomous acquisition mode includes: the same object is photographed at different angles.

As an alternative embodiment, the specific process of acquiring the background picture of the family scene includes: by extracting background pictures that conform to the family scene from the public scene identification dataset.

As an alternative embodiment, the processing is performed on the article example picture, and the specific process of deleting the background in the article example picture includes:

graying the article example picture and the corresponding background-free picture, then carrying out subtraction to obtain a grayscale difference image, then carrying out binarization processing on the grayscale difference image, and then carrying out expansion corrosion operation on the binarization image;

and performing edge detection on the processed binary image to obtain an edge set, calculating the area enclosed by each edge, taking the edge with the second largest edge enclosing area, and filling the edge to obtain the mask of the corresponding article example.

As an alternative embodiment, when information is labeled, the labeled information includes an example category label, a corresponding composite picture name, a composite picture path, and example picture coordinates and width and height information.

A home item detection data set construction system comprising:

the image acquisition module is configured to acquire an article instance image in a network crawling or autonomous acquisition mode and acquire a family scene background image;

the image preprocessing module is configured to process the article example image, delete the background in the article example image and obtain an article example part and a binarization mask corresponding to the article example part; processing the size of the background picture to reduce the resolution of the background picture;

the image synthesis module is configured to perform scale scaling and rotation operations on the binarization mask of the article, obtain a temporary mask according to a certain background image proportion, generate coordinates of the mask in the background image, synthesize the example image into the background image according to the position and size of the mask, and label information.

As an alternative embodiment, the picture acquiring module comprises a network crawling module, a parser, a downloader and a storage device, wherein the network crawling module is configured to use python crawler technology to request a picture search engine to search for a picture of a specific category of articles, obtain returned webpage information, parse the content of the returned webpage information through the parser to obtain picture URL information, send a picture downloading request through the downloader, store the picture, and screen out a complex background picture by using digital image processing technology.

As an alternative embodiment, the picture acquiring module comprises a picture acquiring platform, the picture acquiring platform comprises an electric rotating disc and a plurality of depth cameras, the plurality of depth cameras are arranged at intervals outside the electric rotating disc, and the objects on the electric rotating disc are shot at different angles according to a set frequency.

Compared with the prior art, the invention has the beneficial effects that:

(1) compared with the picture acquisition mode of the existing data set construction method, the method has the advantages that the picture is crawled without actual shooting, the speed is high, the cost is low, the scene is simply built by the acquisition of an automatic platform, and the picture acquisition cost is lower;

(2) according to the method, the article example part and the corresponding mask are obtained by eliminating the picture through the background, manual intervention is not needed, the segmentation effect is good, and the processing speed is high;

(3) the method includes the steps that labeling information of an article detection data set comprises object surrounding frame information, each object surrounding frame needs to be drawn through manual labeling, a data set picture is directly obtained through a picture synthesis mode, surrounding frame information is obtained according to the position of a mask, a labeling file is automatically generated, and labor and time cost are saved.

(4) When the data set is generated, the object data set pictures containing different scales and poses can be generated according to the needs, the diversity of the data set is stronger, and the generalization capability of the detection model is favorably improved.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a flow chart of detection data set construction in an embodiment of the present invention;

FIG. 2 is a schematic diagram of an automated sample image capture platform according to an embodiment of the invention;

FIG. 3 illustrates the contents of a markup document in an embodiment of the present invention;

FIG. 4 is a flowchart illustrating batch generation of annotation information for data sets according to an embodiment of the present invention.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

The embodiment provides a method for constructing a family article detection data set based on web crawler and background elimination, which comprises the following steps:

step 1, acquiring an article example picture, and acquiring the example picture through two ways respectively. The first method adopts a web crawler, an Icrawler multithreading crawler frame in python is used for crawling, a crawl website has Baidu pictures, necessary pictures and Google pictures, the crawler frame is initialized firstly, corresponding configuration is carried out, picture storage paths are set and stored on a PC (personal computer) of windows10, the pictures of the same category are placed under the same folder named with category names, a crawler parser thread is set to be 2, a downloader thread is set to be 4, crawling keywords are set to be cups, the maximum crawling number is 1000, then a crawler script is operated, and the pictures are automatically downloaded.

After the picture crawling is completed, screening picture results through a script program written by python, wherein the screening principle is as follows: firstly, graying the crawled picture, then counting the number of each pixel value, then calculating the proportion of each pixel value, screening according to the proportion value of 255 of the pixel value, keeping the picture when the value is greater than gamma, or deleting the picture, wherein in the embodiment, one thousand crawled sample pictures are counted, and gamma is 0.65.

The second kind adopts and builds automatic thing example picture collection platform, and the platform comprises an electric turntable and three depth cameras among this example, and three depth cameras pass through tripod fixed position and height. The electric platform is selected from a rotating disc royal white ND-P3006-common remote control version, and the depth camera is an Intel RealSense 435i camera which is respectively arranged at an interval of 45 degrees from three directions of the rotating disc, as shown in figure 2. The distance between the three cameras and the center of the turntable is 65cm, the heights of the three cameras are 80cm, 110cm and 150cm respectively, the depression angles are 10 degrees, 30 degrees and 65 degrees, and the base of the electric turntable is placed on a white tea table with the height of 75 cm. When the pictures are collected, the electric rotating disc is set to select the speed to be 15s per circle, and the resolution of the collected pictures is 640 x 480. The picture acquisition equipment is a PC (personal computer) with a windows10 system and a green joint docking station, the three cameras are connected to the PC through the docking station, and photographing and picture storage are controlled through a script program.

The picture collection script is written through python, before the script is written, an open source SDKwindows10 version provided by an official is installed, a camera is connected and updated to the latest firmware, then a virtual environment is built by using Anaconda3, and python3.7 and pyrealnse 2 are installed in the environment. The script content is as follows: the configuration of three cameras is initialized firstly, the color image channel is set, the resolution is 640 x 480, the format is bgr8, the frame rate of the cameras is 30fps, then the cameras are connected, and the sampling times n and the sampling intervals t are set. And (4) entering circulation, sequentially storing the RGB pictures of the three cameras, delaying for t seconds, repeating the process of storing the pictures, and stopping the picture acquisition process when the sampling number reaches n. In this example, n is 15, and the sampling interval is 1 second.

And after the picture collection is finished, taking down the article, and then shooting one picture under the condition of no article.

And 2, acquiring a background picture of the family scene. In this example, the background pictures are extracted from the lightweight indoor scene data set indoorscreen, and 150 background pictures are selected from five family scenes, namely a living room, a kitchen, a dining room, a bedroom and a bathroom, and the total number of the five family scenes is 750.

And step 3, preprocessing the picture. The preprocessing comprises two parts, namely object example picture processing and background picture processing. The processing procedure of the article example picture is detailed as follows:

(1) firstly, placing the article example pictures under the folders named by categories according to category classification, and simultaneously placing the corresponding background pictures without the objects in the background picture folders under the same upper-level folder, and naming the pictures with [ bg _ ] [ cls _ name ] [ 000000-.

(2) For each article example picture, inputting the RGB picture and the corresponding non-article background picture into an article example segmentation program, extracting an example part and a mask thereof by using a background elimination technology, saving the example part picture and the corresponding mask according to the naming rules of [ cls _ name ] [ 000000-. The program in this example was written using MatlabR2018 a. The program flow is as follows:

(a) firstly, graying an example picture and a background picture without an object to obtain two gray level images, and then, subtracting to obtain a difference gray level image of the two gray level images, wherein the graying calculation formula is as follows:

Gray(i,j)＝0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j) (1)

(b) and then carrying out binarization processing on the difference image to obtain a binarization image of the difference image.

(c) And then carrying out expansion corrosion operation on the binary image to obtain the binary image with a sealed edge.

(d) And detecting the edge of the picture by using a Marr-Hildreth edge detector on the basis of the morphologically processed binary picture to obtain an edge set of the picture.

(e) And calculating the area enclosed by each edge, sequencing the edges according to the enclosed areas, and taking the edge with the second largest enclosed area as the outline of the article example.

(f) And filling the edge to obtain a mask corresponding to the object, and obtaining a result picture of example segmentation according to the mask.

(3) Finally, the obtained example partial picture and the mask are cut, and then rotation of various angles is carried out, so as to obtain a series of rotated results, wherein each 15 degrees of rotation is carried out once in the embodiment, and the total number of the rotation is 24. And finally, obtaining 24 example partial images and 24 corresponding masks from each article example image, wherein the images are RGB images, and RGB three-channel numerical values of the object part in the mask are 255.

The background picture processing procedure is detailed as follows:

firstly, the longest dimension of a picture is limited to 500 pixels by imitating a Pascal VOC data set, the picture smaller than 500 pixels is deleted, and the rest pictures are reserved. And for the rest pictures, calculating the ratio of the longest side dimension of the picture to the longest side dimension of the picture as a scaling ratio, then resetting the size of the picture according to the scaling ratio, and finally storing the pictures according to scene types, such as a living room, a bedroom and the like. Taking the background picture with the resolution of 640 × 480 as an example, the scaling ratio is 1.28, and the re-picture size is 500 × 375.

And 4, synthesizing the data set picture to generate the labeling information. In this embodiment, the format of the data set adopts a pascal voc format, each piece of annotation information includes a category and coordinates of upper left and lower right corners of the bounding box, and the annotation information corresponding to each picture is saved in an XML file. The flow of step 4 is as follows:

firstly, the width and height w of an object are obtained according to example pictures and masks_objAnd h_objObtaining the background width and height w according to the background picture_bgAnd h_objSelecting a background picture, namely providing scenes corresponding to various article types through a txt file, then randomly selecting the scenes, and calculating the width and height of a target according to the proportion r of the articles in the background:

in this embodiment, the ratio r of the articles in the background is represented by 2/3/4/5, which is used to generate articles with different dimensions.

Then randomly generating coordinates (x) of the upper left corner point of the object frame_min，y_min)∈(0～w_bg-w_target，0～h_bg-h_target) And calculating to obtain the coordinates (x) of the lower right corner point_max，y_max)：

x_max＝x_min+w_target (4)

y_max＝y_min+h_target (5)

And finally filling the coordinates and the category names obtained by calculation into an XML (extensive makeup language) labeled file, storing and synthesizing a picture and a labeled file according to the same name, wherein the content of the labeled file comprises the name of a folder where the data is located, the picture path corresponding to the labeled file, the picture dimension information and the labeled information (label, x) of the object_min，y_min，x_max，y_max) I.e., category name and bounding box coordinate information, are shown in detail in fig. 3.

It should be noted that in this embodiment, the generation of the data set is performed in batch, and the flow of generating the data set in batch is shown in fig. 4. Storing the object example pictures and the masks of different categories according to the categories, storing the background pictures according to the scenes, giving the scenes in which the objects may appear through the prior knowledge of the text, selecting the background pictures according to the scenes, and zooming the examples in different scales in the generation process to obtain a data set containing the objects with different sizes.

After the data set picture and the label file are synthesized, the picture and the label file are respectively placed into separate folders and placed under the same upper-level folder. Then dividing the data set into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%, selecting pictures in each set in a random mode, storing picture names by txt texts, wherein each set corresponds to one txt text.

And 5, constructing a YOLOv3 neural network and testing the validity of the data set. The YOLOv3 neural network is a full convolution neural network (FCN), and outputs the information of the class and bounding box of the detected object, which can complete the related task of target detection. Training a YOLOv3 neural network through the constructed detection data set, and evaluating the performance of the network by using an mAP (mean Average precision) index, thereby indirectly detecting the effectiveness of the target detection data set. The network training parameters are set as: the epochs are 50, and the learning rate is 0.0001. The trained detection model can reach mAP of 0.83 on the test set, and the data shows that the data set generated by the method is suitable for training and testing the neural network.

Example two:

a home item detection data set construction system comprising:

The picture acquisition module can comprise a network crawling module, a resolver, a downloader and a storage device, wherein the network crawling module is configured to use python crawler technology to request a picture search engine to search for a picture of a specific category of articles to obtain returned webpage information, the resolver resolves the content of the picture to obtain picture URL information, the downloader sends a picture downloading request, the picture is stored, and a digital image processing technology is used for screening out a complex background picture.

The picture acquisition module can also comprise a picture acquisition platform, the picture acquisition platform comprises an electric turntable and a plurality of depth cameras, the plurality of depth cameras are arranged on the outer side of the electric turntable at intervals, and the pictures are shot on articles on the electric turntable at different angles according to set frequency.

The embodiment can be used in many fields such as article detection in the field of home service robots.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A method for constructing a household article detection data set is characterized by comprising the following steps: the method comprises the following steps:

acquiring a background picture of a family scene;

2. A method of constructing a home items detection data set as claimed in claim 1, wherein: in the example picture of the article, the article is in an unobstructed state, which accounts for at least 1/2 of the whole picture.

3. A method of constructing a home items detection data set as claimed in claim 1, wherein: the specific process of acquiring the picture of the article instance by using the network crawling mode comprises the following steps: the method comprises the steps of using a python crawler technology to request a picture search engine to search pictures of specific categories of articles to obtain returned webpage information, analyzing the content of the returned webpage information to obtain picture URL information, downloading the pictures and storing the pictures locally, and using a digital image processing technology to screen out complex background pictures.

4. A method of constructing a home items detection data set as claimed in claim 1, wherein: the specific process of acquiring the article example picture by using the autonomous acquisition mode comprises the following steps: the same object is photographed at different angles.

5. A method of constructing a home items detection data set as claimed in claim 1, wherein: the specific process for acquiring the background picture of the family scene comprises the following steps: by extracting background pictures that conform to the family scene from the public scene identification dataset.

6. A method of constructing a home items detection data set as claimed in claim 1, wherein: the specific process of deleting the background in the article example picture comprises the following steps: graying the article example picture and the corresponding background-free picture, then carrying out subtraction to obtain a grayscale difference image, then carrying out binarization processing on the grayscale difference image, and then carrying out expansion corrosion operation on the binarization image;

7. A method of constructing a home items detection data set as claimed in claim 1, wherein: when information labeling is carried out, the labeled information comprises an example type label, a corresponding synthesized picture name, a synthesized picture path, example picture coordinates and width and height information.

8. A family article detection data set construction system is characterized in that: the method comprises the following steps:

9. A household item detection data set construction system as claimed in claim 8, characterized by: the picture acquisition module comprises a network crawling module, a resolver, a downloader and storage equipment, wherein the network crawling module is configured to use python crawler technology to request a picture search engine to search pictures of specific categories of articles to obtain returned webpage information, the resolver resolves the content of the pictures to obtain picture URL information, the downloader sends a picture downloading request, the pictures are stored, and complex background pictures are screened out by using a digital image processing technology.

10. A household item detection data set construction system as claimed in claim 8, characterized by: the picture acquisition module comprises a picture acquisition platform, the picture acquisition platform comprises an electric turntable and a plurality of depth cameras, the plurality of depth cameras are arranged on the outer side of the electric turntable at intervals, and objects on the electric turntable are shot at different angles according to set frequency.