AU2019100969A4

AU2019100969A4 - Chinese Food Recognition and Search System

Info

Publication number: AU2019100969A4
Application number: AU2019100969A
Authority: AU
Inventors: Hongming Dai; Jianxiang Dong; Congqing Fan; Yisiyuan Huang; Yang LV; Zhaoyan Wang
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2019-10-03
Anticipated expiration: 2027-08-29

Abstract

Abstract An image recognition and search system of Chinese dishes is based on deep learning algorithm. And it is able to recognize four popular Chinese dishes: Potato Silk, Baby Cabbage, Mapo Tofu and Fried Beans and then search in local files to display a similar image in the same type. Our image recognition system mainly includes three parts: Data preprocessing, Training model and Graphic user interface. The input images are collected by the searching keyword on the Internet. After the Data preprocessing, in which we present out effort to remove the irrelevant images, we transforms the input image to the format of 32X32 resolution. Then the dataset we collected are separated into two parts: Training set and the Test set. And we build out convolutional neural network (CNN) which consists of 4 convolutional layers and 2 full connected layers. Then this system processes extracts features of input data by going through convolutional layers and classifies the image through full connected layers. Our Chinese food recognition system achieves an average accuracy of 83.25% on the test. By using the Graphic User Interface, the users can upload the target picture and the operations and results will be displayed on the textfield.

Description

Chinese Food Recognition and Search System

FIELD OF INVENTION

The present invention is categorized in the field of image identification in image processing and machine learning. More particularly, the invention, a Chinese dish image recognition system, relates to deep learning and neural network.

BACKGROUND OF THE INVENTION

Food plays a quiet important role in everyone’s daily life. In recent years, with the development of technology, mobile phones and computers have become essential tools in our lives. An increasing number of people prefer to take a photo of the food of what they cooked eat, and then share the picture on the social networks such as WeChat or Twitter. In real life, people are curious about the dishes that they have never met before. But we know that Chinese dishes include so many different types of dishes with different names, generally normal people can only tell a few kinds of dishes without any tools. Consequently, helping normal people to recognize different dishes has become a necessary need. Besides dish recognition and search can be applied in other areas, such as Micro POS in restaurants, intelligent plates which can give an introduction of the dish.

2019100969 29 Aug 2019

Dish recognition is a problem related to classification. To do this, generally we give different types of dishes different labels. This process requires an automatic method for image recognition. In traditional image recognition, we usually have to extract different features from different objects and regard them as definitions of these objects. When processing image recognition, the computer compares the image matrices with different definitions and finds the closest one. However, dish recognition is much more difficult than any other normal objects, because some dishes are quite similar, especially Chinese dishes. Consequently, traditional image recognition is hard to recognize some similar dishes in an accuracy.

In recent years, with the development of high performance computing platform and big data processing techniques, deep learning technology has become a powerful method for image recognition. Deep learning is to build a neural network model with several hidden layers and a large number of datasets for training to learn more useful features, and thus improve the accuracy of the prediction.

In this invention, we use TensorFlow, which is a framework that is widely used for deep learning applications. In the whole process we collect dish images on our own by using a web crawler which can download pictures from BaiduPicture automatically. We also label different pictures on our own. What makes us different from other dish

2019100969 29 Aug 2019 recognition processes is that the pictures we use for training are all in the size of 32X32 while others are much more lager than this, like Resnet or VGG using 224X224 or bigger [1,2], This means that this invention doesn’t have high requirements for images, which makes it much more accessible and faster than other methods. After data processing, we feed the dataset for training into the convolutional neural network in batches. Through the whole training process, the program will optimize sets of weights and biases through all layers of the whole neural network to minimizing the loss function. By adjusting the parameters of the network, this model can achieve an optimal performance and a high accuracy.

SUMMARY OF THE INVENTION

This invention is a Chinese food recognition system which includes three parts: Data preprocessing, Training model and Graphic user interface. It is able to recognize 4 popular Chinese food (Potato Silk, Baby Cabbage, Mapo Tofu and Fried beans) and can be retrained for any types of other food and then search in the local file to display one similar image in the same type.

1. Data Description

There are a lot of food styles in Chinese history and food culture, such as Sichuan cuisine, Northeastern cuisine, etc. Our Chinese food dataset

2019100969 29 Aug 2019 contains 4 most popular Chinese cooks in different cuisines, which are Potato Silk, Baby Cabbage, Mapo Tofu and Fried Beans. These 4 types of food are gathered from the internet. However, some of them are mislabeled since they are uploaded privately without rechecking. We need to remove these irrelevant images to ensure the consistency. Some food samples are shown in Figure 1(a)

2. Data Preprocessing

Data Clean and Label

After collecting the food images, we first cleaned these images and generate the corresponding labels for each image. Then we removed irrelevant images and images with irregular height or width (too large or too small) which may be distort after being resized. Finally, we resize the all images into the same size of 32X32 (shown in Figure 1(b)).

Data Augmentation

We did data augmentation to increase the samples from about 1100 images each label to 5500 images each label. We achieved that by randomly rotating, shifting zooming images in a small range using Keras (shown in Figure 1(c)).

3. Model

The schematic diagram of the CNN model in our invention is shown

2019100969 29 Aug 2019 in Figure 2. Randomly cropped patches in size of 32X32 from the original images are used as input. The CNN network consists of 4 convolutional layers all with output depth as 32, using Relu activation functions while only the second and fourth layers using max pooling. After pass through a CNN network, the input is decoded and flattened as a feature vector, which is classified by the following two full-connected layers. The last layer has 4 nodes with sigmoid activation functions, where the outputs are used to calculate the loss and possibilities for each labels. The model is trained by implementing the back-propagation algorithm using only one CPU and with batch gradient descent and Adam optimizer.

4. Graphic User Interface

The GUI for this invention is shown in Figure 3. We can use “Open a Target Image” button to load an image to recognize. Then the recognition system will read the image and output the classification result, search and display an image having the same name by clicking “Show Recognition Result” button. The textfield on the right displays the operations and results and the image on the left is a similar image searched by the system.

DESCRIPTION OF THE DRAWINGS

2019100969 29 Aug 2019

The appended drawings are only for the purpose of description and explanation but not for limitation, wherein:

1. Fig.l: Samples of original images, (b) Images reshaped into size of 32X32. (c) Data augmentation by rotating, shifting and zooming images.

2. Fig.2 is the schematic diagram of the CNN model.

3. Fig.3 is the loss changes with training steps.

DESCRIPTION OF PREFERRED EMBODIMENT

Food image recognition is one of the most promising applications of visual object recognition, since it will not only help people find what they are eating but help estimate food calories and analyze people’s eating habit for the sake of health. Meanwhile,CNN (Convolutional Neural Network), is currently one of the most widely used deep learning methods in machine learning due to its powerful modelling capability on complex and large-scale datasets. We find it a great idea to apply CNN to food image recognition.

Due to the fact that many food items are indistinguishable in terms of shape or color, some food characteristics are even hard to be recognized by simple examination and it’s extremely challenging to identify every food item. Therefore, we state that it would be a better

2019100969 29 Aug 2019 option to generally classify and identify food items to attempt to automatically approximate its dietary information.

1. Data collection:

In the implementation of the project, we pay much attention to the quality of images which are used to train our model to guarantee the precision. We used a crawler tool to download images from web including Baidu image search and Google image search with acceptable quality. We prepared a total of four categories of dishes (Dry-cooked string beans, stir-fried tofu in hot sauce, shredded potatoes and baby cabbage in chicken soup) and each dish has about 5,000 images.

Potato Silk	[1,0,0,0]
Baby Cabbage	[0,1,0,0]
Mapo Tofu	[0,0,1,0]
Fried Beans	[0,0,0,1]

Table 1

Table l:the one-hot encoding for labels

2. Data pre-processing

Furthermore, we re-processed the collected images and rotated them

2019100969 29 Aug 2019 at different angles to simulate different camera angles in real life to enhance the recognition rate of the code. In order to open up new areas, we decided to do image recognition for low resolution images. To achieve it, we converted the pixels of all the original image into shape of 32X32, so that we can reach a high accuracy even with a low-quality picture.

After collecting images from internet, categorical labels are not found to be useful for this project, because the number of possible values is often limited to fixed set. And many machine learning algorithms only operate on numeric label data directly. And we also apply one-hot encoding to the integer representation. For example: we first map our dishes as: “0” represents Potato Silk, “1” represents Baby Cabbage, “2” represents Mapo Tofu and “3” represents Fried beans. Then we convert the integer labels into one-hot encoding and the final results are shown in table 2.

3. Architecture and training process

Our architecture consists of an input layer, 4 convolutional layers and 2 fully connected layers. We input images of size 32X32, Which is a tensor in shape of (256,32,32,3), then after passing through the budget of the four convolutional layers, including maxpooling layer to achieve downsampling and reduce the size of the data space, we extracted the features of the images. Then images are classified by passing through full

2019100969 29 Aug 2019 connected layers.

We employed a dropout and L2 regularization technique to avoid overfitting in the training phase and used our own framework built by tensorflow to run the experiments. During the process of training the data sets, dropout rate, base learning rate, decay rate and iteration steps are 4 main parameters for training.

In machine learning, accuracy is one of the most important metric for evaluating models. Moreover, some other crucial metrics, such as confusion matrix and standard deviation, are also measured in our invention. In our model, the average accuracy reaches 83.25% with the standard deviation of 1.793 and the confusion matrix is shown in Table 2.

Table.2 is the confusion matrix for the model

	Potato Silk	Baby Cabbage	Mapo Tofu	Fried Beans
Potato Silk	808	171	20	22
Baby Cabbage	180	780	51	44
Mapo Tofu	17	61	880	17

2019100969 29 Aug 2019

Fried	12	16		52	869
Beans

Table 2

Claims

Claims

What is claimed is:

1. A convolutional neural network (CNN) framework written in python and tensorflow which can be used to construct neural network.
2. The whole existing recognition system comprising:

a) a trained CNN model able to classify images;

b) a graphic user interface (GUI);

c) Chinese food dataset collected for training.