US20240046332A1 - Providing Local Recommendations based on Images of Consumable Items - Google Patents
Providing Local Recommendations based on Images of Consumable Items Download PDFInfo
- Publication number
- US20240046332A1 US20240046332A1 US18/382,230 US202318382230A US2024046332A1 US 20240046332 A1 US20240046332 A1 US 20240046332A1 US 202318382230 A US202318382230 A US 202318382230A US 2024046332 A1 US2024046332 A1 US 2024046332A1
- Authority
- US
- United States
- Prior art keywords
- images
- image
- consumable food
- restaurants
- food item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000013305 food Nutrition 0.000 claims abstract description 120
- 239000004615 ingredient Substances 0.000 claims abstract description 62
- 238000000034 method Methods 0.000 claims abstract description 50
- 238000013528 artificial neural network Methods 0.000 claims abstract description 27
- 238000012552 review Methods 0.000 claims description 22
- 235000021183 entrée Nutrition 0.000 claims description 17
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 4
- 230000015654 memory Effects 0.000 description 21
- 244000294411 Mirabilis expansa Species 0.000 description 13
- 235000015429 Mirabilis expansa Nutrition 0.000 description 13
- 235000013536 miso Nutrition 0.000 description 13
- 235000014347 soups Nutrition 0.000 description 13
- 238000004891 communication Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 6
- 235000021186 dishes Nutrition 0.000 description 6
- 230000006855 networking Effects 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 238000000513 principal component analysis Methods 0.000 description 5
- 241000251468 Actinopterygii Species 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000036541 health Effects 0.000 description 3
- 235000012149 noodles Nutrition 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 244000291564 Allium cepa Species 0.000 description 2
- 235000002732 Allium cepa var. cepa Nutrition 0.000 description 2
- 241001474374 Blennius Species 0.000 description 2
- 235000013527 bean curd Nutrition 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 235000013361 beverage Nutrition 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 235000011850 desserts Nutrition 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000019580 granularity Nutrition 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 241000237536 Mytilus edulis Species 0.000 description 1
- 240000008467 Oryza sativa Japonica Group Species 0.000 description 1
- 235000005043 Oryza sativa Japonica Group Nutrition 0.000 description 1
- 241000237502 Ostreidae Species 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 235000015278 beef Nutrition 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000037396 body weight Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000021185 dessert Nutrition 0.000 description 1
- 235000020785 dietary preference Nutrition 0.000 description 1
- 235000020805 dietary restrictions Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 235000020638 mussel Nutrition 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 235000020636 oyster Nutrition 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 229940083753 renown Drugs 0.000 description 1
- 238000007790 scraping Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0204—Market segmentation
- G06Q30/0205—Location or geographical consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0639—Item locations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/68—Food, e.g. fruit or vegetables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present disclosure generally relates to providing machine learning supported image search results to users in accordance with some embodiments.
- the methods and systems disclosed herein provide users with local recommendations based on food images. For example, a user browsing social media may see a social media post with an image of a consumable item, such as a food dish that looks appetizing.
- the social media post may not include the name of the dish so the user cannot rely on a text search.
- the user could provide the image of the food dish to disclosed recommendation system.
- the recommendation system in this example may return a list of local restaurants that serve similar dishes, or it may return information about local stores that sell ingredients identified in the food dish so that the user may make the dish at home.
- the recommendation system in this example may use one or more trained neural networks to identify the food dish and/or similar food dishes.
- Embodiments of the present disclosure propose a method and apparatus for identifying food content in a photograph and concurrently providing a curated list of restaurants offering the identified food content.
- An image is received from a user, the image being including a consumable item.
- a location of the user is determined and a plurality of restaurants within a predetermined distance of the user is identified.
- a plurality of consumable item images associated with the plurality of restaurants is obtained and using a neural network, one or more images having similar consumable items are identified from the plurality of consumable item images.
- a similarity score is calculated for each identified image.
- the one or more identified images with similar consumable items include the one or more ingredients of the consumable item in the image.
- the restaurant associated with each identified image is identified and ranked based on the identified image's similarity score.
- a list of ranked restaurants is provided to the user.
- the one or more restaurants may also be provided to the user as through a widget when a user uploads, views, or otherwise interacts with an image containing
- the consumable item is an entrée offered at a restaurant.
- image pre-processing is performed.
- the image is passed through one or more trained convolutional neural networks to obtain an image embedding.
- An embedding is a mapping of a discrete categorical variable to a vector of continuous numbers used to reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space.
- the obtained image embedding may be used to find nearest neighbors in an embedding space.
- the embedding space is pre-trained, and the obtained image embedding is used to determine the contents of the received image and provide k-nearest neighbor recommendations.
- the consumable item includes one or more identifiable ingredients and the one or more contents that are identified are the one or more identifiable ingredients.
- the restaurants are ranked based at least in part on the location of the user and a calculated different between the location of the user to a location of the restaurant. In some embodiments, the restaurants are ranked based at least in part on each restaurant's hours of service, take-out options, number of reviews, and/or score of reviews. In some embodiments, the restaurants are ranked based at least in part on an analysis on a number of matches of the one or more contents of the consumable item in the image to one or more contents of the one or more similar images. In some embodiments, the method includes providing a list of restaurants, ranking the one or more restaurants in descending order based on the similarity score, and including a name of the consumable item with the list of restaurants.
- the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter.
- article of manufacture is intended to encompass a computer program accessible from any computer-readable storage device or media.
- Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others.
- computer-readable media, i.e., not storage media may additionally include communication media such as transmission media for wireless signals and the like.
- FIG. 1 is a diagram of an example computer system providing an image search result to a client device in accordance with some embodiments.
- FIGS. 2 - 3 are diagrams of training databases for identifying consumable content items of an image in accordance with some embodiments.
- FIG. 4 is an example interaction between a user and the computer system for identifying consumable content items of an image and providing recommendations based on the identified content items in accordance with some embodiments.
- FIGS. 5 A- 5 D depict an example user interface for identifying consumable content items of an image and providing similar images and recommendations based on the identified content items in accordance with some embodiments.
- FIGS. 6 A- 6 B depict an example flow diagram for user interface for viewing the results of an image query.
- FIG. 7 is a block diagram of an example networked system depicting an environment in which embodiments may operate.
- FIG. 8 is a block diagram of a computer system to implement one or more example embodiments.
- images e.g., entrees, desserts, cocktails
- local restaurants and/or stores that provide food similar to an image of food.
- Having a computer implemented system identify the food item is advantageous to provide a curated list of recommended restaurants serving the same or similar food item, providing recipes and local stores for the ingredients for the user to create the food item themselves, providing other similar images of the food for viewing pleasure, providing a history of the food item, and providing similar food items based on a user's dietary restrictions. Additionally and or alternatively, having an automated system that can recommend restaurants in geographic proximity relieves the user of the burden to separately identify the food item, determine its components, and piece the information together to find places that serve the item and are reasonable for them to visit, among other benefits.
- a system and method are used to receive an image of food and search for similar images in a database of food-based images in an efficient manner.
- Searching for images in a database for food content is advantageous as the search does not require a name for the food content and can leverage unstructured image data directly from restaurants and/or other databases to perform the image search rather than relying on text from menus and/or user reviews to recommend restaurants.
- a useful technical solution of identifying individual ingredients visible in an image provides granularity and flexibility in solving the technical problem of identifying an unfamiliar food item.
- the inventors have provided a solution that includes identifying ingredients of the food content, finding restaurants serving the food content, providing the name of the food content, and more, of which the details are discussed herein.
- Examples disclosed herein are directed to methods, apparatus comprising several modules and apparatus comprising one or more processors and memory embodied with executable instructions for providing curated recommendations based on an image content search.
- a client device is equipped with a search engine that can receive, process, and deliver information about the image content as well as corresponding similar images and restaurant information.
- a “search engine” may be referred to the entire device and software components for searching data that includes images and presenting identified image information to the user.
- the search engines disclosed herein may utilize one or more types of artificial intelligence including but not limited to machine learning, neural network, deep learning, computer vision, and natural language processing.
- FIG. 1 is a block diagram of consumable item identification engine server 202 providing curated image and restaurant recommendations to a client device 100 using the response generation module 220 as shown in FIG. 8 .
- the client device 100 provides image data that was provided by the user, and user data, such as the user's location or other data entered by the user or obtained from the user profile, to the consumable item identification engine server 202 . Additionally, and or alternatively, the consumable item identification engine may run on client device 100 .
- Image processing module 210 may include various processing techniques to analyze the image in an efficient manner.
- the image processing module 210 may process the image
- the content identification module 212 may analyze the image to identify the contents of the image
- the restaurant module 214 may analyze the identified contents of the image to select one or more restaurants serving the food item matching the identified contents of the image
- the recipe module 216 may provide corresponding recipes for the identified contents of the image.
- the user provided image, the determined processed image, the determined identified content, the determined restaurant, and the determined recipe, or any combination thereof may be provided to the recommendation generation module 220 in order to generate one or more recommendations to be provided to the user.
- server 202 sequentially processes the image through the recommendation generation module 220 .
- the recommendation generation module 220 may also consider the determined identified content, the determined restaurant, the determined recipe as determined by the content identification module 212 , the restaurant module 214 , and recipe module 216 , respectively.
- the recommendation generation module 220 further processes the recommendation through location module 218 .
- the location module 218 determines a location of the client device and/or the user of the client device to provide personalized (e.g., curated) recommendations specific to the user, such as based on proximity of the user to the determined restaurant.
- the search result is eventually provided back to the client device 100 for presentation to the user, and the procedure may be repeated for multiple search requests.
- FIG. 2 and FIG. 3 illustrates an example image classification model 200 and 300 in accordance with some embodiments.
- FIG. 2 is shown with trained models based on ingredients identified in a dish, such as mussels and oysters.
- FIG. 3 is shown with trained models based on various dishes with Ramen illustrated as an example dish.
- image embeddings for all restaurant dish images in a database image cache are precomputed via one or more neural networks trained on one or more training sets of food images.
- the embeddings are low-dimensional, learned continuous vector representations of discrete variables. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space.
- the uploaded image 250 is pre-processed (e.g., via channel-wise normalization, image resizing) and passed through one or more convolutional neural networks such as model A 252 and/or model B 254 to yield a high-dimensional image embedding 256 and 258 .
- model A 252 and model B 254 (referred to as 352 and 354 , respectively in FIG. 3 ) may be pre-trained on training sets of food-based images and combined to output an accurate classification of the received input. It is understood that there may be additional models as appropriate to obtain the final embedding.
- Example training sets include ImageNet, Recipe1m, Food101, and Menu-Match.
- Example neural networks include ResNet50, ResNeXt50, and DenseNet, and VGG, which may serve as model B 254 .
- One such example neural network included approximately 25 million parameters and is convolutional neural network that is 50 layers deep.
- the neural networks used include an image input layer that comprises a convolution layer with a kernel size of 7*7 and 64 different kernels all with a stride of size 2. Max pooling with also a stride size of 2. In the next convolution there is a 1*1, 64 kernel following this a 3*3, 64 kernel and at last a 1*1, 256 kernel. These three layers are repeated in total 3 times, resulting in 9 layers. Next a kernel of 1*1, 128 followed by a kernel of 3*3, 128 and at last a kernel of 1*1, 364. This is repeated 4 times for 12 layers.
- Recipe1m dataset may be obtained by scraping public networks, such as the web for dish names and associated images.
- a web-scrape identified 1.4 million images, 2106 classes/dishes.
- the resulted data was noisy and both from an image perspective and label perspective.
- a cleaned dataset resulted in 327,000 images with 517 classes.
- model A 252 may be trained using a FoodIE and Recipe1m training datasets with shared body weights. Label spaces are preserved, and both datasets have different label granularities.
- Recipe1m has more detail.
- Model A 252 may include an additional dataset such as a noisy FoodIE training set that has predicted pseudo labels generated by a baseline trained neural network. The use of an additional dataset provides additional training samples with less work required to label such additional samples. The resulting combined training set included approximately 598,000 images with a test set size of about 93,000 images.
- model A 252 includes a final layer 260 that performs a probability generation.
- the probability generation may be a SoftMax function, or a log-odds function, that is a function that represents probability as values from 0 to 1.
- the probability generation may be used to generate a predicted class 262 based on the highest probability.
- the predicated class 262 may be used to generate a list of restaurants where dishes similar to the uploaded image may be found.
- model B 254 embeddings 258 in one embodiment may be combined with embeddings of final layer 260 to form joint embedding 264 representative of a lower layer or layers of two separate models.
- the embeddings 405 and 258 may be simply concatenated to form joint embedding 264 .
- the embedding dimensionality of joint embedding 264 is reduced using dimensionality reduction techniques such as principal component analysis (PCA) resulting in a smaller joint embedding 268 .
- PCA principal component analysis
- the smaller joint embedding 268 may comprise 256 bits and provides a greatly reduced dataset for a following nearest neighbor analysis as indicated at 268 .
- k nearest neighbors in the database image cache to the uploaded image are found at 270 in the common embedding space.
- the contents of the received image are classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors.
- different weights are assigned to the contributions of the neighbors to allow the nearer neighbors to contribute more to the average than the distant neighbors.
- the nearest neighbors can be found in a variety of ways including using approximate solutions. As shown in set 270 , the k-nearest neighbors are selected. For example, the five nearest neighbors to the uploaded image may be found.
- the k-nearest neighbors in a test set in the trained joint embedding space may be identified.
- a label that is applied is based on a majority vote over the k neighbors.
- a final score is generated over all labeled n images in the test set.
- corresponding metadata may be retrieved and used to rank the identified restaurants in order of decreasing image similarity.
- the calculation and usage of similarity scores are described with respect to FIGS. 6 A- 6 B .
- the output comprises a list of restaurants with addresses that have been identified as serving a Ramen-like dish based on the input image 350 .
- the models 352 and 354 include embeddings 356 and 358 .
- Model 352 also includes probability level 360 (similar to the final layer 260 that performs a probability generation).
- Embedding 356 and 358 are combined to form joint embeddings 364 that are reduced by PCA to form reduced embeddings 366 that are further used to identify nearest neighbors to generate the list of restaurants 368 .
- the recommendation may include a name of the food dish.
- the included dish name may be based on the user's location (e.g., “chips” in England or “fries” in the United States).
- recommendation may include a generic name for the food dish (e.g., “ramen”) and/or the name of the food dish used by the restaurant (e.g., “beef ramen special”).
- FIG. 4 depicts an example situation of a plurality of electronic devices sending and receiving images in accordance with some embodiments.
- a first user device 100 - 1 may take a flash photograph 402 of a food item 404 .
- the user of the first user device 100 - 1 may elect to post the photograph of food item 404 on the user's social media platform 406 via network 104 .
- a second user device 100 - 2 may see the photograph of food item 404 and be interested in finding out more about the depicted food item.
- the second user device 100 - 2 may submit an image search query to network 104 to identify the contents of the image.
- the user may also specify the user's location and a request for restaurants within a radius of the user's location.
- a provided recommendation 408 is curated to include restaurants within a reasonable distance from the user (e.g., 25 miles). In some embodiments, the recommendation is curated to include restaurants based on one or more criteria (e.g., hours of operation, take-out availability, availability of the entrée itself). In some embodiments, the recommendation is curated to include recipes to make the identified food item. The recommendation can include a simplified version, a traditional version, a chef's version, or a combination of the above. Further, the recommendation can include, along with the recipes, a grocery list of ingredients to prepare the food item. In some embodiments, the recommendation can include a brief or extended history about the origins of the food item.
- the example provided recommendation 408 is an example representation thereof and may be provided to user device 100 - 2 in various configurations including recipes, grocery lists, and historical information.
- a user may find a peacefully presented entrée item image on the user's social media platform.
- the user has never seen the entrée before and would like to identify it so that the user may try it for him/herself.
- the user presents the image to a search engine (e.g., Microsoft Bing) and asks the search engine to identify the entrée and nearby restaurants that offer the entrée for takeout.
- the search engine e.g., consumable item identification engine server 202
- the search engine provides a curated list to the user with famous, renown, and/or Michelin rated restaurants serving the entrée.
- the search engine outputs a name of the food item (e.g., via content identification module 212 ), a recipe to prepare the food item (e.g., via recipe module 216 ), and an ingredients list to prepare the food item from the recipe.
- the food item is a drink and the search engine output a name of the drink, a recipe to prepare the drink, and an ingredients list to prepare the drink from the recipe.
- the search engine outputs one or more local restaurants (e.g., via restaurant module 214 ) serving the food item as well as reviews of the specific food item from the one or more local restaurants. In some embodiments, the search engine outputs reviews of the one or more local restaurants provided in the output to the user.
- FIGS. 5 A- 5 D depict an example web browser and associated user interface for searching images in accordance with some embodiments.
- the example web browser 500 includes a user interface 502 and a search bar 504 within the user interface.
- the search bar 504 may be used to drop-in, upload, or otherwise place an image or image file for searching. It is understood that the methods, systems, and its implementations are not limited to web browsers but may further include applications, widgets, and other suitable means.
- an updated user interface 512 is displayed.
- the updated user interface 512 includes an identification of the food content in the image 514 “Ramen Noodles (Japanese)” and one or more restaurant recommendations 516 , 518 .
- the one or more restaurant recommendations may be provided in a ranked list based on location, reviews, price, availability of the food content, health score, hours of operation, take-out availability, online-ordering availability, and various other suitable criteria.
- “Iza Ramen” is shown at the top of the list of restaurant recommendations. The restaurant is 1.2 miles away from the user's current location and has a four-star rating.
- the rating is on the food content itself. In some embodiments, the rating is on the restaurant.
- “Orenchi Ramen” is shown second on the list of restaurant recommendations. Although “Orenchi Ramen” has a higher rating than “Iza Ramen” the recommendation lists “Iza Ramen” before “Orenchi Ramen” based on the distance of the restaurant to the user.
- the restaurant recommendation may provide “Orenchi Ramen” at the top of the list based on a user profile (e.g., user profiles 226 , FIG. 8 ) indicating the user is more likely to travel a farther distance to purchase items at a higher ranked establishment.
- the recommendation takes into consideration a user's profile in determining how and what to provide to the user in the recommendation.
- the updated user interface 522 includes a list of recipes to create the identified food content of the image 514 .
- the recommendation provides similar food content and corresponding recipes 524 and 526 .
- the provided recipes may be provided based on a ranking of reviews, complexity of the recipe, the number of ingredients, compatibility with the user's dietary preferences, and more.
- FIG. 5 D shows another updated user interface 532 in another view of example web browser 730.
- the updated user interface 532 includes a recipe to create the identified food content as well as a view of recommended ingredients to be purchased at a local grocery store.
- the recommended ingredients may include one or more identified items within the image such as corn, seaweed, or noodles.
- the recommended ingredients may also be identified as being available at specific specialty stores. For example, to make a ramen recipe, miso paste, seaweed, bean sprouts, and Japanese rice noodles are required and sold at a local Japanese grocery store. The recommendation may provide directions to the local Japanese grocery store for easy procurement of the receipt ingredients.
- FIGS. 6 A- 6 B depict an example process 600 for identifying search image contents according to some embodiments.
- Process 600 executes on a computing device such as consumable item identification engine server 202 .
- Process 600 proceeds at 602 where a computing device (e.g., server system) receives an image associated with a consumable item from a restaurant.
- the received image is sent by a user 102 of an electronic device (e.g., client device 100 ).
- a user uploads an image to a search engine (e.g., Microsoft Bing) to request identification of the image contents.
- the consumable item is 604 an entrée offered at a restaurant.
- the consumable item is a beverage, dessert, or appetizer offered at a restaurant.
- the computing device identifies 606 one or more contents (e.g., ingredients) of the consumable item in the image. In some embodiments, the computing device determines 608 a location of the user. In some embodiments, the computing devices identifies 610 a plurality of restaurants within a predetermined distance of the user and further obtains 612 a plurality of consumable item images associated with the plurality of restaurants.
- contents e.g., ingredients
- the computing device determines 608 a location of the user.
- the computing devices identifies 610 a plurality of restaurants within a predetermined distance of the user and further obtains 612 a plurality of consumable item images associated with the plurality of restaurants.
- the computing device after receiving the image from the user, performs 614 image pre-processing (e.g., at image processing module 210 ).
- Image pre-processing can include image resizing, de-skewing, channel-wise normalization, cropping, and other suitable processes.
- the computing devices uses 618 a neural network to determine one or more similar images from a database, the one or more similar images include the one or more contents of the consumable item in the image.
- the consumable item includes 620 one or more identifiable ingredients.
- the one or more contents that's are identified in the image are the one or more identifiable ingredients.
- an image of miso soup is uploaded by the user.
- tofu cubes, fish cakes, and green onions are visible.
- the visible ingredients are identified as being part of the miso soup and a list of the ingredients (e.g., tofu cubes, fish cakes, and green onion) is provided to the user.
- a user may indicate that the user is interested in a specific ingredient and wishes to receive results that include the specific ingredient.
- a user may be searching for miso soup that specifically includes fish cake and the search results are filtered and ranked according to the results. The results that include the fish cake are ranked higher than results that do not and are presented to the user.
- the computing device obtains 621 an image embedding.
- the image embedding is reduced using a dimensionality reduction technique such as principal component analysis (PCA).
- PCA principal component analysis
- the computing device determines 622 a restaurant associated with each of the one or more similar images.
- miso soup is identified as being the food content in the image.
- the computing device may determine one or more images from a database also containing miso soup and a corresponding restaurant from which the images were taken (e.g., determined based on image metadata).
- the computing device calculates 624 a similarity score indicating at least a similarity between one or more restaurants offering a consumable dish associated with the identified one or more contents of the image.
- the restaurants are ranked 626 based at least in part on the location of the user and a distance from the location of the user to the restaurant.
- the restaurants are ranked 628 based at least in part on each restaurant's hours of service, take-out options, number of reviews, or score of reviews (e.g., ratings).
- the restaurants are ranked 630 (e.g., via learning module 222 ) based at least in part on an analysis of a number of matches of the one or more contents of the consumable item in the image to one or more contents of the one or more similar images.
- the computing system receives an image of miso soup and identifies three similar images to the received image.
- the three similar images are of miso soup (the correct and closest match), spicy miso soup, and bone broth.
- a restaurant serving the food content is identified.
- Restaurant A is identified as serving miso soup
- restaurant B is identified as serving spicy miso soup
- restaurant C is identified as serving bone broth.
- a similarity score is calculated to quantify a similarity between the soup and identified restaurant to the content of the received image (miso soup).
- the search results aim to identify restaurants local to the user that serves the food content in the image search.
- the similarity score for restaurant A may be higher than the similarity scores for restaurant B and C based on the distance between each restaurant to the user.
- the similarity scores for restaurants A, B, and C may also be defined by each restaurant's hours of service, pricing, take-out options, health scores, reviews, and/or ratings.
- the similarity scores for restaurants A, B, and C may also be defined by further analysis of a number of matches of the one or more contents (e.g., ingredients) identified in the received image.
- the computing device ranks 624 the one or more restaurants as a function of the similarity score.
- the ranking may be calculated as an average, a weighted average or other suitable mathematical calculation to organize the one or more restaurants in a list.
- the computing device provides 634 a ranked list of restaurants associated with the one or more similar images based at least on the similarity score and a distance of the restaurant to the location of the user.
- the computing system ranks 636 the one or more restaurants in descending order based on the similarity score and includes a name of the consumable item with the ranked list of restaurants.
- FIG. 7 depicts a block diagram that illustrates an example client device 100 for collecting and providing user and environment data.
- the client device 100 has at least one processor 106 , a transceiver 108 , one or more presentation components 110 , one or more input/output (I/ 0 ) ports 112 , one or more I/O components 114 , and a memory 124 .
- processor 106 the client device 100 has at least one processor 106 , a transceiver 108 , one or more presentation components 110 , one or more input/output (I/ 0 ) ports 112 , one or more I/O components 114 , and a memory 124 .
- I/ 0 input/output
- the client device 100 may take the form of a mobile computing device or any other portable device, such as, a mobile telephone, laptop, tablet, computing pad, notebook, gaming device, portable media player, etc.
- the client device 100 may also include less portable devices such as desktop personal computers, kiosks, tabletop devices, industrial control devices, etc.
- Other examples may incorporate the client device 100 as part of a multi-device system in which two separate physical devices share or otherwise provide access to the illustrated components of the computing device 100 .
- the processor 106 may include any quantity of processing units and is programmed to execute computer-executable instructions for implementing aspects of the disclosure.
- the instructions may be performed by the processor or by multiple processors within the computing device or performed by a processor external to the computing device.
- the processor 106 is programmed to execute methods such as the one illustrated in accompanying FIGS. 8 A- 8 B .
- some examples may make the processor 106 programmed to present an experience in a user interface (“UI”), e.g., the UI shown in FIGS. 5 A- 5 D .
- UI user interface
- the processor 106 may represent an implementation of techniques to perform the operations described herein. For example, the operations may be performed by an analog client device 100 and/or a digital client device 100 .
- the transceiver 108 is an antenna capable of transmitting and receiving radio frequency (“RF”) signals.
- RF radio frequency
- One skilled in the art will appreciate and understand that various antenna and corresponding chipsets may be used to provide communicative capabilities between the client device 100 and other remote devices. Examples are not limited to RF signaling, however, as various other communication modalities may alternatively be used.
- the presentation components 110 visibly or audibly present information on the computing device 100 .
- presentation components 110 include, without limitation, computer monitors, televisions, projectors, touch screens, phone displays, tablet displays, wearable device screens, televisions, speakers, vibrating devices, and any other devices configured to display, verbally communicate, or otherwise indicate image search results to a user.
- the client device 100 may be a smart phone or a mobile tablet that includes speakers capable of playing audible search results to the user.
- the client device 100 is a computer in a car that audibly presents search responses through a car speaker system, visually presents search responses on display screens in the car, e.g., situated in the car's dashboard, within headrests, on a drop-down screen, or the like, or a combination thereof.
- Other examples may present the disclosed search responses through various other display or audio presentation components 110 .
- I/O ports 112 allow the client device 100 to be logically coupled to other devices and I/O components 114 , some of which may be built into client device 100 while others may be external.
- I/O components 114 include a microphone 116 , one or more sensors 118 , a camera 120 , and a touch device 122 .
- the microphone 116 captures speech from the user 102 and/or speech of or by the user 102 .
- the sensors 118 may include any number of sensors on or in a mobile computing device, electronic toy, gaming console, wearable device, television, vehicle, or other computing device 100 .
- the sensors 118 may include an accelerometer, magnetometer, pressure sensor, photometer, thermometer, global positioning system (“GPS”) chip or circuitry, bar scanner, biometric scanner for scanning fingerprint, palm print, blood, eye, or the like, gyroscope, near-field communication (“NFC”) receiver, or any other sensor configured to capture data from the user 102 or the environment.
- the camera 120 captures images or video of the user 102 .
- the touch device 122 may include a touchpad, track pad, touch screen, or other touch-capturing device.
- the illustrated I/O components 114 are but one example of I/O components that may be included on the client device 100 .
- I/O components 114 may include additional or alternative I/O components 114 , e.g., a sound card, a vibrating device, a scanner, a printer, a wireless communication module, or any other component for capturing information related to the user or the user's environment.
- I/O components 114 e.g., a sound card, a vibrating device, a scanner, a printer, a wireless communication module, or any other component for capturing information related to the user or the user's environment.
- the memory 124 includes any quantity of memory associated with or accessible by the computing device 100 .
- the memory 124 may be internal to the client device 100 , as shown in FIG. 1 , external to the client device 100 , not shown in FIG. 1 , or both.
- Examples of memory 124 may include, without limitation, random access memory (RAM), read only memory (ROM), electronically erasable programmable read only memory (EEPROM), flash memory or other memory technologies, CDROM, digital versatile disks (DVDs) or other optical or holographic media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, memory wired into an analog computing device, or any other medium for encoding desired information and for access by the client device 100 .
- Memory 124 may also take the form of volatile and/or nonvolatile memory; may be removable, non-removable, or a combination thereof; and may include various hardware devices, e.g., solid-state memory, hard drives, optical-disc drives, etc. Additionally, or alternatively, the memory 124 may be distributed across multiple client devices 100 , e.g., in a virtualized environment in which instruction processing is carried out on multiple client devices 100 .
- the memory 124 stores, among other data, various device applications that, when executed by the processor 106 , operate to perform functionality on the computing device 100 .
- Examples of applications may include search applications, instant messaging applications, electronic-mail application programs, web browsers, calendar application programs, address book application programs, messaging programs, media applications, location-based services, search programs, and the like.
- the applications may communicate with counterpart applications or services such as web services accessible via the network 104 .
- the applications may include client-operating applications that correspond to server-side applications executing on remote servers or computing devices in the cloud.
- instructions stored in memory 124 comprise a communications interface application 126 , a user interface application 128 , and a consumable item identification application 130 .
- the communications interface application 126 includes computer-executable instructions for operating a network interface card and/or a driver for operating the network interface card. Communication between the client device 100 and other devices may occur using any protocol or mechanism over a wired or wireless connection, or across the network 104 .
- the communications interface application 126 is operable with RF and short-range communication technologies using electronic tags, such as NFC tags, Bluetooth® brand tags, or the like.
- the user interface application 128 includes a graphics application for displaying data to the user and receiving data from the user.
- the user interface application 128 may also include computer-executable instructions for operating the graphics card to display search results and corresponding images or speech on or through the presentation components 110 .
- the user interface application 128 may also interact with the various sensors 118 and camera 120 to both capture and present information through the presentation components 110 .
- FIG. 8 is a block diagram of a networking environment 800 for providing image search results and recommendations on client device(s) 100 .
- the networking environment 800 may include multiple client devices 100 , or just one as shown in FIG. 2 , a consumable item identification engine server 202 , and a database 224 communicating over a network 104 .
- user and environment data are communicated by the client devices 100 over the network 104 to the consumable item identification engine server 202 , and the consumable item identification engine server 202 generates tailored image search results that are provided back to the client devices 100 for presentation to their respective users 102 .
- the networking environment 800 shown in FIG. 8 is merely an example of one suitable computing system environment and is not intended to suggest any limitation as to the scope of use or functionality of examples disclosed herein. Neither should the illustrated networking environment 800 be interpreted as having any dependency or requirement related to any single component, module, index, or combination thereof.
- the network 104 may include any computer network, for example the Internet, a private network, local area network (LAN), wide area network (WAN), or the like.
- the network 104 may include various network interfaces, adapters, modems, and other networking devices for communicatively connecting the client devices 100 , the consumable item identification engine server 202 , and the database 224 .
- the network 104 may also include configurations for point-to-point connections.
- Computer networks are well known to one skilled in the art, and therefore do not need to be discussed in detail herein.
- the client devices 100 may be any type of client device discussed above in reference to FIG. 1 .
- Each client device 100 may capture user and/or environment data from their respective users and communicate the captured user and environment data over the network 104 to the consumable item identification engine server 202 and/or the database 224 .
- each device may be equipped with a communications interface application 126 , as discussed above in reference to FIG. 1 .
- the consumable item identification engine server 202 may be capable of providing intelligent image search results including restaurant, recipe, and ingredient recommendations to many client devices 100 capable of communicating their respectively captured user and environment data over the network 104 .
- the consumable item identification engine server 202 may control image searches on many client devices 100 .
- the client device 100 may be equipped with various software applications and presentation components 110 for receiving image-based search queries and presenting recommended restaurants to their respective users.
- a mobile phone may present a search bar or search screen for image search.
- the search bar/screen can receive a user selected image of various image file types.
- the image is processed through various systems of identifying content in images.
- the search service may then provide an ordered list of restaurants that provide (e.g., serve) the same or similar food items identified in the user selected image.
- the illustrated client devices and the aforesaid presentation mechanisms are not an exhaustive list covering all examples. Many different variations of client devices 100 and presentation techniques may be used to deliver the image search result responses to users.
- the consumable item identification engine server 202 represents a server or collection of servers configured to execute different web-service computer-executable instructions.
- the consumable item identification engine server 202 includes a processor 204 to process executable instructions, a transceiver 206 to communicate over the network 104 , and a memory 208 embodied with at least the following executable instructions: an image processing module 210 , a recommendation generation module 220 , and a learning module 222 .
- the memory 208 may include instructions for content identification module 212 , a restaurant module 214 , a recipe module 216 , and a location module 218 .
- consumable item identification engine server 202 is illustrated as a single box, one skilled in the art will appreciate that the consumable item identification engine server 202 may, in fact, be scalable.
- the consumable item identification engine server 202 may include multiple servers operating various portions of software that collectively generate image search results on the client devices 100 .
- the database 224 provides backend storage of Web, user, and environment data that may be accessed over the network 104 by the consumable item identification engine server 202 or the client devices 100 and used by the consumable item identification engine server 202 to generate search results.
- the Web, user, and environment data stored in the database includes, for example but without limitation, user profiles 226 , search context 228 and so on. Additionally, though not shown for the sake of clarity, the servers of the database 224 may include their own processors, transceivers, and memory.
- networking environment 800 depicts the database 224 as a collection of separate devices from the consumable item identification engine server 202 ; however, examples may actually store the discussed Web, user, and environment data shown in the database 224 on the consumable item identification engine server 202 .
- a “user profile” refers to an electronically stored collection of information related to the user. Such information may include the user's name, age, gender, height, weight, demographics, current location, residency, citizenship, family, friends, schooling, occupation, hobbies, skills, interests, Web searches, health information, birthday, anniversary, celebrated holidays, moods, user's condition, and any other personalized information associated with the user.
- the user profile includes static profile elements, e.g., name, birthplace, etc., and dynamic profile elements that change over time, e.g., residency, age, condition, etc.
- the user profile may be built through probing questions to the user or through analyzing the user's behavior on one or more client devices.
- user profiles 226 may include static and/or dynamic data parameters for individual users. Examples of user profile data include, without limitation, a user's age, gender, race, name, location, interests, Web search history, social media connections and interactions, purchase history, routine behavior, jobs, or virtually any unique data points specific to the user.
- the user profiles 226 may be expanded to encompass various other aspects of a user's life.
- a user employs client device 100 to perform an image search.
- the client device sends a request to a server system (e.g., consumable item identification engine server 202 ), rather than performing the search itself.
- a server system e.g., consumable item identification engine server 202
- a user sees a photograph of a food item on social media. However, the photograph does not have any accompanying information about what it is, where it was from, what ingredients are included, etc. The user is curious and wants to find out more about the food in the photograph and sends the image as a search query to a search service provider (e.g., database such as Microsoft Bing).
- a search service provider e.g., database such as Microsoft Bing.
- Various pre-processing techniques may be applied to the image including cropping, de-skewing, normalizing, etc. before an image embedding is created.
- An embedding is a mapping of a discrete categorical variable to a vector of continuous numbers.
- embeddings are low or high dimensional, learned continuous vector representations of discrete variables, and the vector representations typically range between 20-4000, but may fall outside of the given range as appropriate.
- Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space and can be used to find nearest neighbors in the embedding space. The nearest neighbors can be used to make recommendations based on user interests or cluster categories.
- the image embedding is then analyzed through one or more convolutional neural networks to obtain k-nearest neighbor images in a trained embedding space.
- the k-nearest neighbor images are then analyzed by reviewing each image's label to create a curated set of neighbor images with matching (or close to matching) labels.
- the user is curious about the food in the photograph and wants to order the food at a local restaurant.
- the server obtains the user's location and adds the local restaurant location information into the search criteria.
- the returned list of the images is from restaurants within a certain distance of the user.
- the functions or algorithms described herein may be implemented in software in one embodiment.
- the software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware-based storage devices, either local or networked.
- modules which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples.
- the software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.
- the functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like.
- the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality.
- the phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software.
- the term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware.
- logic encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation.
- An operation can be performed using, software, hardware, firmware, or the like.
- the terms, “component,” “system,” and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof.
- a component may be a process running on a processor, an object, an executable, a program, a function, a subroutine, a computer, or a combination of software and hardware.
- processor may refer to a hardware component, such as a processing unit of a computer system.
- the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter.
- article of manufacture is intended to encompass a computer program accessible from any computer-readable storage device or media.
- Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others.
- computer-readable media, i.e., not storage media may additionally include communication media such as transmission media for wireless signals and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides method and apparatus for determining a food item from a photograph and a corresponding restaurant serving the food item. An image is received from a user, the image being associated with a consumable item. One or more ingredients of the consumable item in the image is identified along with a location of the user and using a neural network, determining one or more similar images from a database. A restaurant associated with each of the one or more similar images is determined along with a similarity score indicating a similarity between the restaurant and the identified content of the image. The one or more restaurants and/or associated similar food items are ranked based on the similarity score and a list of ranked restaurants is provided to the user.
Description
- This application is a continuation application of U.S. patent application Ser. No. 17/102,009, filed Nov. 23, 2020, which application is incorporated herein by reference in its entirety.
- The present disclosure generally relates to providing machine learning supported image search results to users in accordance with some embodiments.
- Sharing photos and videos in the digital age has become more than commonplace. Sending and receiving photos of cats, memes, recipes, outfits, and more have become part of people's daily lives. As photo and video sharing becomes more prevalent, an abundance of photographs traverses the internet that can be searched and found. However, it may become cumbersome to visually identify objects found in images.
- The methods and systems disclosed herein provide users with local recommendations based on food images. For example, a user browsing social media may see a social media post with an image of a consumable item, such as a food dish that looks appetizing. The social media post may not include the name of the dish so the user cannot rely on a text search. In this example, the user could provide the image of the food dish to disclosed recommendation system. The recommendation system in this example may return a list of local restaurants that serve similar dishes, or it may return information about local stores that sell ingredients identified in the food dish so that the user may make the dish at home. The recommendation system in this example may use one or more trained neural networks to identify the food dish and/or similar food dishes.
- Embodiments of the present disclosure propose a method and apparatus for identifying food content in a photograph and concurrently providing a curated list of restaurants offering the identified food content. An image is received from a user, the image being including a consumable item. A location of the user is determined and a plurality of restaurants within a predetermined distance of the user is identified. A plurality of consumable item images associated with the plurality of restaurants is obtained and using a neural network, one or more images having similar consumable items are identified from the plurality of consumable item images. A similarity score is calculated for each identified image. The one or more identified images with similar consumable items include the one or more ingredients of the consumable item in the image. The restaurant associated with each identified image is identified and ranked based on the identified image's similarity score. A list of ranked restaurants is provided to the user. The one or more restaurants may also be provided to the user as through a widget when a user uploads, views, or otherwise interacts with an image containing food even if the user was not specifically searching for restaurants.
- In some embodiments, the consumable item is an entrée offered at a restaurant. In some embodiments, image pre-processing is performed. In some embodiments, the image is passed through one or more trained convolutional neural networks to obtain an image embedding. An embedding is a mapping of a discrete categorical variable to a vector of continuous numbers used to reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space.
- The obtained image embedding may be used to find nearest neighbors in an embedding space. In some embodiments, the embedding space is pre-trained, and the obtained image embedding is used to determine the contents of the received image and provide k-nearest neighbor recommendations.
- In some embodiments, the consumable item includes one or more identifiable ingredients and the one or more contents that are identified are the one or more identifiable ingredients. In some embodiments, the restaurants are ranked based at least in part on the location of the user and a calculated different between the location of the user to a location of the restaurant. In some embodiments, the restaurants are ranked based at least in part on each restaurant's hours of service, take-out options, number of reviews, and/or score of reviews. In some embodiments, the restaurants are ranked based at least in part on an analysis on a number of matches of the one or more contents of the consumable item in the image to one or more contents of the one or more similar images. In some embodiments, the method includes providing a list of restaurants, ranking the one or more restaurants in descending order based on the similarity score, and including a name of the consumable item with the list of restaurants.
- Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter. The term, “article of manufacture,” as used herein is intended to encompass a computer program accessible from any computer-readable storage device or media. Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others. In contrast, computer-readable media, i.e., not storage media, may additionally include communication media such as transmission media for wireless signals and the like.
- This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
- The disclosed aspects will hereinafter be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
-
FIG. 1 is a diagram of an example computer system providing an image search result to a client device in accordance with some embodiments. -
FIGS. 2-3 are diagrams of training databases for identifying consumable content items of an image in accordance with some embodiments. -
FIG. 4 is an example interaction between a user and the computer system for identifying consumable content items of an image and providing recommendations based on the identified content items in accordance with some embodiments. -
FIGS. 5A-5D depict an example user interface for identifying consumable content items of an image and providing similar images and recommendations based on the identified content items in accordance with some embodiments. -
FIGS. 6A-6B depict an example flow diagram for user interface for viewing the results of an image query. -
FIG. 7 is a block diagram of an example networked system depicting an environment in which embodiments may operate. -
FIG. 8 is a block diagram of a computer system to implement one or more example embodiments. - A particular challenge exists in creating systems to identify different food items from images, (e.g., entrees, desserts, cocktails) especially food items from different ethnic cultures, and to also identify local restaurants and/or stores that provide food similar to an image of food. Further, it may be difficult to ascertain specific ingredients within a food item or the name of the food item when the image obstructs the view of some ingredients or for uncommon dishes at all a priori. It may also be difficult to ascertain how the food item is to be cooked/prepared based on the image.
- Having a computer implemented system identify the food item is advantageous to provide a curated list of recommended restaurants serving the same or similar food item, providing recipes and local stores for the ingredients for the user to create the food item themselves, providing other similar images of the food for viewing pleasure, providing a history of the food item, and providing similar food items based on a user's dietary restrictions. Additionally and or alternatively, having an automated system that can recommend restaurants in geographic proximity relieves the user of the burden to separately identify the food item, determine its components, and piece the information together to find places that serve the item and are reasonable for them to visit, among other benefits.
- A system and method are used to receive an image of food and search for similar images in a database of food-based images in an efficient manner. Searching for images in a database for food content is advantageous as the search does not require a name for the food content and can leverage unstructured image data directly from restaurants and/or other databases to perform the image search rather than relying on text from menus and/or user reviews to recommend restaurants. Further, a useful technical solution of identifying individual ingredients visible in an image provides granularity and flexibility in solving the technical problem of identifying an unfamiliar food item. The inventors have provided a solution that includes identifying ingredients of the food content, finding restaurants serving the food content, providing the name of the food content, and more, of which the details are discussed herein.
- It is understood that this solution is not solely reserved for identifying entrées in a photograph and may have other uses such as identifying beverages, desserts, appetizers, and other consumable items.
- The inventive subject matter will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
- Examples disclosed herein are directed to methods, apparatus comprising several modules and apparatus comprising one or more processors and memory embodied with executable instructions for providing curated recommendations based on an image content search. Using the disclosed examples, a client device is equipped with a search engine that can receive, process, and deliver information about the image content as well as corresponding similar images and restaurant information. A “search engine” may be referred to the entire device and software components for searching data that includes images and presenting identified image information to the user. The search engines disclosed herein may utilize one or more types of artificial intelligence including but not limited to machine learning, neural network, deep learning, computer vision, and natural language processing.
- Having generally provided an overview of some of the disclosed examples, attention is drawn to the accompanying drawings to further illustrate some additional details. The illustrated configurations and operational sequences are provided to aid the reader in understanding some aspects of the disclosed examples. The accompanying figures are not meant to limit all examples, and thus some examples may include different components, devices, or sequences of operation while not departing from the scope of the disclosed examples discussed herein. In other words, some examples may be embodied or may function in different ways than those shown.
- It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents.
-
FIG. 1 is a block diagram of consumable itemidentification engine server 202 providing curated image and restaurant recommendations to aclient device 100 using theresponse generation module 220 as shown inFIG. 8 . Theclient device 100 provides image data that was provided by the user, and user data, such as the user's location or other data entered by the user or obtained from the user profile, to the consumable itemidentification engine server 202. Additionally, and or alternatively, the consumable item identification engine may run onclient device 100. - In some examples, the image and user data input flows to
image processing module 210.Image processing module 210 may include various processing techniques to analyze the image in an efficient manner. In some embodiments, theimage processing module 210 may process the image, thecontent identification module 212 may analyze the image to identify the contents of the image, therestaurant module 214 may analyze the identified contents of the image to select one or more restaurants serving the food item matching the identified contents of the image, and therecipe module 216 may provide corresponding recipes for the identified contents of the image. The user provided image, the determined processed image, the determined identified content, the determined restaurant, and the determined recipe, or any combination thereof may be provided to therecommendation generation module 220 in order to generate one or more recommendations to be provided to the user. - In some examples, to generate image search results,
server 202 sequentially processes the image through therecommendation generation module 220. Therecommendation generation module 220 may also consider the determined identified content, the determined restaurant, the determined recipe as determined by thecontent identification module 212, therestaurant module 214, andrecipe module 216, respectively. Therecommendation generation module 220 further processes the recommendation throughlocation module 218. Thelocation module 218 determines a location of the client device and/or the user of the client device to provide personalized (e.g., curated) recommendations specific to the user, such as based on proximity of the user to the determined restaurant. The search result is eventually provided back to theclient device 100 for presentation to the user, and the procedure may be repeated for multiple search requests. - FIG.2 and
FIG. 3 illustrates an exampleimage classification model FIG. 2 is shown with trained models based on ingredients identified in a dish, such as mussels and oysters.FIG. 3 is shown with trained models based on various dishes with Ramen illustrated as an example dish. - Prior to receiving an image query, image embeddings for all restaurant dish images in a database image cache are precomputed via one or more neural networks trained on one or more training sets of food images. The embeddings are low-dimensional, learned continuous vector representations of discrete variables. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space.
- After receiving an image query uploaded from a user, the uploaded
image 250 is pre-processed (e.g., via channel-wise normalization, image resizing) and passed through one or more convolutional neural networks such asmodel A 252 and/ormodel B 254 to yield a high-dimensional image embedding 256 and 258. Bothmodel A 252 and model B 254 (referred to as 352 and 354, respectively inFIG. 3 ) may be pre-trained on training sets of food-based images and combined to output an accurate classification of the received input. It is understood that there may be additional models as appropriate to obtain the final embedding. Example training sets include ImageNet, Recipe1m, Food101, and Menu-Match. Example neural networks include ResNet50, ResNeXt50, and DenseNet, and VGG, which may serve asmodel B 254. One such example neural network included approximately 25 million parameters and is convolutional neural network that is 50 layers deep. - In one example, the neural networks used include an image input layer that comprises a convolution layer with a kernel size of 7*7 and 64 different kernels all with a stride of size 2. Max pooling with also a stride size of 2. In the next convolution there is a 1*1, 64 kernel following this a 3*3, 64 kernel and at last a 1*1, 256 kernel. These three layers are repeated in total 3 times, resulting in 9 layers. Next a kernel of 1*1, 128 followed by a kernel of 3*3, 128 and at last a kernel of 1*1, 364. This is repeated 4 times for 12 layers.
- These layers are followed by a kernel of 1*1, 256 and two more kernels with 3*3, 256 and 1*1, 1024 and this is repeated 6 time giving a total of 18 layers. These layers are followed by a 1*1, 364 kernel with two more of 3*3, 364 and 1*1, 2048 and this is repeated 3 times giving a total of 9 layers. After that, an average pool is performed, and the network is ended with a fully connected layer containing 1000 nodes and at the end a SoftMax function as an additional layer.
- The training sets are labeled with both ingredients and dish names. Recipe1m dataset may be obtained by scraping public networks, such as the web for dish names and associated images. In one embodiment, a web-scrape identified 1.4 million images, 2106 classes/dishes. However, the resulted data was noisy and both from an image perspective and label perspective. A cleaned dataset resulted in 327,000 images with 517 classes.
- In one example,
model A 252 may be trained using a FoodIE and Recipe1m training datasets with shared body weights. Label spaces are preserved, and both datasets have different label granularities. In one embodiment, Recipe1m has more detail.Model A 252 may include an additional dataset such as a noisy FoodIE training set that has predicted pseudo labels generated by a baseline trained neural network. The use of an additional dataset provides additional training samples with less work required to label such additional samples. The resulting combined training set included approximately 598,000 images with a test set size of about 93,000 images. - In one embodiment,
model A 252 includes afinal layer 260 that performs a probability generation. The probability generation may be a SoftMax function, or a log-odds function, that is a function that represents probability as values from 0 to 1. The probability generation may be used to generate a predictedclass 262 based on the highest probability. In one embodiment, the predicatedclass 262 may be used to generate a list of restaurants where dishes similar to the uploaded image may be found. - In a further embodiment, rather than using a
single model A 252,model B 254embeddings 258 in one embodiment may be combined with embeddings offinal layer 260 to form joint embedding 264 representative of a lower layer or layers of two separate models. In one embodiment, theembeddings 405 and 258 may be simply concatenated to form joint embedding 264. Optionally, the embedding dimensionality of joint embedding 264 is reduced using dimensionality reduction techniques such as principal component analysis (PCA) resulting in a smaller joint embedding 268. The smaller joint embedding 268 may comprise 256 bits and provides a greatly reduced dataset for a following nearest neighbor analysis as indicated at 268. - After the uploaded image is pre-processed and passed through neural networks, in some embodiments, k nearest neighbors in the database image cache to the uploaded image are found at 270 in the common embedding space. The contents of the received image are classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. In some embodiments, different weights are assigned to the contributions of the neighbors to allow the nearer neighbors to contribute more to the average than the distant neighbors. The nearest neighbors can be found in a variety of ways including using approximate solutions. As shown in
set 270, the k-nearest neighbors are selected. For example, the five nearest neighbors to the uploaded image may be found. As illustrated, the k-nearest neighbors in a test set in the trained joint embedding space may be identified. A label that is applied is based on a majority vote over the k neighbors. At 272, a final score is generated over all labeled n images in the test set. - Further, corresponding metadata may be retrieved and used to rank the identified restaurants in order of decreasing image similarity. The calculation and usage of similarity scores are described with respect to
FIGS. 6A-6B . - In
FIG. 3 , the output comprises a list of restaurants with addresses that have been identified as serving a Ramen-like dish based on theinput image 350. Note that themodels embeddings Model 352 also includes probability level 360 (similar to thefinal layer 260 that performs a probability generation). Embedding 356 and 358 are combined to formjoint embeddings 364 that are reduced by PCA to form reducedembeddings 366 that are further used to identify nearest neighbors to generate the list ofrestaurants 368. As shown inFIG. 3 the recommendation may include a name of the food dish. In some embodiments, the included dish name may be based on the user's location (e.g., “chips” in England or “fries” in the United States). In some embodiments, recommendation may include a generic name for the food dish (e.g., “ramen”) and/or the name of the food dish used by the restaurant (e.g., “beef ramen special”). -
FIG. 4 depicts an example situation of a plurality of electronic devices sending and receiving images in accordance with some embodiments. A first user device 100-1 may take aflash photograph 402 of afood item 404. The user of the first user device 100-1 may elect to post the photograph offood item 404 on the user'ssocial media platform 406 vianetwork 104. A second user device 100-2 may see the photograph offood item 404 and be interested in finding out more about the depicted food item. The second user device 100-2 may submit an image search query to network 104 to identify the contents of the image. In some embodiments, the user may also specify the user's location and a request for restaurants within a radius of the user's location. In some embodiments, without explicit definition from the user, a providedrecommendation 408 is curated to include restaurants within a reasonable distance from the user (e.g., 25 miles). In some embodiments, the recommendation is curated to include restaurants based on one or more criteria (e.g., hours of operation, take-out availability, availability of the entrée itself). In some embodiments, the recommendation is curated to include recipes to make the identified food item. The recommendation can include a simplified version, a traditional version, a chef's version, or a combination of the above. Further, the recommendation can include, along with the recipes, a grocery list of ingredients to prepare the food item. In some embodiments, the recommendation can include a brief or extended history about the origins of the food item. The example providedrecommendation 408 is an example representation thereof and may be provided to user device 100-2 in various configurations including recipes, grocery lists, and historical information. - In another example, a user may find a delightfully presented entrée item image on the user's social media platform. The user has never seen the entrée before and would like to identify it so that the user may try it for him/herself. The user presents the image to a search engine (e.g., Microsoft Bing) and asks the search engine to identify the entrée and nearby restaurants that offer the entrée for takeout. The search engine (e.g., consumable item identification engine server 202) determines what the entrée in the image is, and searches for restaurants offering the entrée and provides a curated list (e.g., via recommendation generation module 220) to the user with restaurants serving the entrée and ranked based on reviews, distance, hours of operations, takeout availability, and more. In some embodiments, the search engine provides a curated list to the user with famous, renown, and/or Michelin rated restaurants serving the entrée.
- In some embodiments, the search engine outputs a name of the food item (e.g., via content identification module 212), a recipe to prepare the food item (e.g., via recipe module 216), and an ingredients list to prepare the food item from the recipe. In some embodiments, the food item is a drink and the search engine output a name of the drink, a recipe to prepare the drink, and an ingredients list to prepare the drink from the recipe. In some embodiments, the search engine outputs one or more local restaurants (e.g., via restaurant module 214) serving the food item as well as reviews of the specific food item from the one or more local restaurants. In some embodiments, the search engine outputs reviews of the one or more local restaurants provided in the output to the user.
-
FIGS. 5A-5D depict an example web browser and associated user interface for searching images in accordance with some embodiments. Theexample web browser 500 includes auser interface 502 and a search bar 504 within the user interface. The search bar 504 may be used to drop-in, upload, or otherwise place an image or image file for searching. It is understood that the methods, systems, and its implementations are not limited to web browsers but may further include applications, widgets, and other suitable means. - As shown in
FIG. 5B , once a search query on the image is initiated on theweb browser 500, an updateduser interface 512 is displayed. The updateduser interface 512 includes an identification of the food content in theimage 514 “Ramen Noodles (Japanese)” and one ormore restaurant recommendations FIG. 8 ) indicating the user is more likely to travel a farther distance to purchase items at a higher ranked establishment. In some embodiments, the recommendation takes into consideration a user's profile in determining how and what to provide to the user in the recommendation. - As shown in
FIG. 5C . another updateduser interface 522 is shown in another view of theexample web browser 500. The updateduser interface 522 includes a list of recipes to create the identified food content of theimage 514. In some embodiments, the recommendation provides similar food content and correspondingrecipes -
FIG. 5D shows another updateduser interface 532 in another view of example web browser 730. The updateduser interface 532 includes a recipe to create the identified food content as well as a view of recommended ingredients to be purchased at a local grocery store. The recommended ingredients may include one or more identified items within the image such as corn, seaweed, or noodles. - The recommended ingredients may also be identified as being available at specific specialty stores. For example, to make a ramen recipe, miso paste, seaweed, bean sprouts, and Japanese rice noodles are required and sold at a local Japanese grocery store. The recommendation may provide directions to the local Japanese grocery store for easy procurement of the receipt ingredients.
-
FIGS. 6A-6B depict anexample process 600 for identifying search image contents according to some embodiments.Process 600 executes on a computing device such as consumable itemidentification engine server 202.Process 600 proceeds at 602 where a computing device (e.g., server system) receives an image associated with a consumable item from a restaurant. In some embodiments, the received image is sent by auser 102 of an electronic device (e.g., client device 100). For example, a user uploads an image to a search engine (e.g., Microsoft Bing) to request identification of the image contents. In some embodiments, the consumable item is 604 an entrée offered at a restaurant. In some embodiments, the consumable item is a beverage, dessert, or appetizer offered at a restaurant. - In some embodiments, the computing device identifies 606 one or more contents (e.g., ingredients) of the consumable item in the image. In some embodiments, the computing device determines 608 a location of the user. In some embodiments, the computing devices identifies 610 a plurality of restaurants within a predetermined distance of the user and further obtains 612 a plurality of consumable item images associated with the plurality of restaurants.
- In some embodiments, after receiving the image from the user, the computing device performs 614 image pre-processing (e.g., at image processing module 210). Image pre-processing can include image resizing, de-skewing, channel-wise normalization, cropping, and other suitable processes.
- In some embodiments, the computing devices uses 618 a neural network to determine one or more similar images from a database, the one or more similar images include the one or more contents of the consumable item in the image. In some embodiments, the consumable item includes 620 one or more identifiable ingredients. The one or more contents that's are identified in the image are the one or more identifiable ingredients. For example, an image of miso soup is uploaded by the user. In the miso soup, tofu cubes, fish cakes, and green onions are visible. The visible ingredients are identified as being part of the miso soup and a list of the ingredients (e.g., tofu cubes, fish cakes, and green onion) is provided to the user. In another example, a user may indicate that the user is interested in a specific ingredient and wishes to receive results that include the specific ingredient. A user may be searching for miso soup that specifically includes fish cake and the search results are filtered and ranked according to the results. The results that include the fish cake are ranked higher than results that do not and are presented to the user.
- In some embodiments the computing device obtains 621 an image embedding. In some embodiments, the image embedding is reduced using a dimensionality reduction technique such as principal component analysis (PCA).
- In some embodiments, the computing device determines 622 a restaurant associated with each of the one or more similar images. Continuing in the example discussed above, miso soup is identified as being the food content in the image. The computing device may determine one or more images from a database also containing miso soup and a corresponding restaurant from which the images were taken (e.g., determined based on image metadata).
- The
method 600 continues inFIG. 6B . In some embodiments, the computing device calculates 624 a similarity score indicating at least a similarity between one or more restaurants offering a consumable dish associated with the identified one or more contents of the image. In some embodiments, the restaurants are ranked 626 based at least in part on the location of the user and a distance from the location of the user to the restaurant. In some embodiments, the restaurants are ranked 628 based at least in part on each restaurant's hours of service, take-out options, number of reviews, or score of reviews (e.g., ratings). In some embodiments, the restaurants are ranked 630 (e.g., via learning module 222) based at least in part on an analysis of a number of matches of the one or more contents of the consumable item in the image to one or more contents of the one or more similar images. - For example, the computing system receives an image of miso soup and identifies three similar images to the received image. The three similar images are of miso soup (the correct and closest match), spicy miso soup, and bone broth. For each of the similar images, a restaurant serving the food content is identified. Restaurant A is identified as serving miso soup, restaurant B is identified as serving spicy miso soup, and restaurant C is identified as serving bone broth. Next, a similarity score is calculated to quantify a similarity between the soup and identified restaurant to the content of the received image (miso soup). The search results aim to identify restaurants local to the user that serves the food content in the image search. Thus, the similarity score for restaurant A may be higher than the similarity scores for restaurant B and C based on the distance between each restaurant to the user. The similarity scores for restaurants A, B, and C may also be defined by each restaurant's hours of service, pricing, take-out options, health scores, reviews, and/or ratings. The similarity scores for restaurants A, B, and C may also be defined by further analysis of a number of matches of the one or more contents (e.g., ingredients) identified in the received image.
- In some embodiments, the computing device ranks 624 the one or more restaurants as a function of the similarity score. The ranking may be calculated as an average, a weighted average or other suitable mathematical calculation to organize the one or more restaurants in a list. In some embodiments, the computing device provides 634 a ranked list of restaurants associated with the one or more similar images based at least on the similarity score and a distance of the restaurant to the location of the user. In some embodiments, the computing system ranks 636 the one or more restaurants in descending order based on the similarity score and includes a name of the consumable item with the ranked list of restaurants.
-
FIG. 7 , depicts a block diagram that illustrates anexample client device 100 for collecting and providing user and environment data. In some examples, theclient device 100 has at least oneprocessor 106, atransceiver 108, one ormore presentation components 110, one or more input/output (I/0)ports 112, one or more I/O components 114, and amemory 124. - The
client device 100 may take the form of a mobile computing device or any other portable device, such as, a mobile telephone, laptop, tablet, computing pad, notebook, gaming device, portable media player, etc. Theclient device 100 may also include less portable devices such as desktop personal computers, kiosks, tabletop devices, industrial control devices, etc. Other examples may incorporate theclient device 100 as part of a multi-device system in which two separate physical devices share or otherwise provide access to the illustrated components of thecomputing device 100. - The
processor 106 may include any quantity of processing units and is programmed to execute computer-executable instructions for implementing aspects of the disclosure. The instructions may be performed by the processor or by multiple processors within the computing device or performed by a processor external to the computing device. In some examples, theprocessor 106 is programmed to execute methods such as the one illustrated in accompanyingFIGS. 8A-8B . - Additionally, or alternatively, some examples may make the
processor 106 programmed to present an experience in a user interface (“UI”), e.g., the UI shown inFIGS. 5A-5D . Moreover, in some examples, theprocessor 106 may represent an implementation of techniques to perform the operations described herein. For example, the operations may be performed by ananalog client device 100 and/or adigital client device 100. - The
transceiver 108 is an antenna capable of transmitting and receiving radio frequency (“RF”) signals. One skilled in the art will appreciate and understand that various antenna and corresponding chipsets may be used to provide communicative capabilities between theclient device 100 and other remote devices. Examples are not limited to RF signaling, however, as various other communication modalities may alternatively be used. - The
presentation components 110 visibly or audibly present information on thecomputing device 100. Examples ofpresentation components 110 include, without limitation, computer monitors, televisions, projectors, touch screens, phone displays, tablet displays, wearable device screens, televisions, speakers, vibrating devices, and any other devices configured to display, verbally communicate, or otherwise indicate image search results to a user. In some examples, theclient device 100 may be a smart phone or a mobile tablet that includes speakers capable of playing audible search results to the user. In other examples, theclient device 100 is a computer in a car that audibly presents search responses through a car speaker system, visually presents search responses on display screens in the car, e.g., situated in the car's dashboard, within headrests, on a drop-down screen, or the like, or a combination thereof. Other examples may present the disclosed search responses through various other display oraudio presentation components 110. - I/
O ports 112 allow theclient device 100 to be logically coupled to other devices and I/O components 114, some of which may be built intoclient device 100 while others may be external. Specific to the examples discussed herein, I/O components 114 include amicrophone 116, one ormore sensors 118, acamera 120, and atouch device 122. Themicrophone 116 captures speech from theuser 102 and/or speech of or by theuser 102. Thesensors 118 may include any number of sensors on or in a mobile computing device, electronic toy, gaming console, wearable device, television, vehicle, orother computing device 100. Additionally, thesensors 118 may include an accelerometer, magnetometer, pressure sensor, photometer, thermometer, global positioning system (“GPS”) chip or circuitry, bar scanner, biometric scanner for scanning fingerprint, palm print, blood, eye, or the like, gyroscope, near-field communication (“NFC”) receiver, or any other sensor configured to capture data from theuser 102 or the environment. Thecamera 120 captures images or video of theuser 102. Thetouch device 122 may include a touchpad, track pad, touch screen, or other touch-capturing device. The illustrated I/O components 114 are but one example of I/O components that may be included on theclient device 100. Other examples may include additional or alternative I/O components 114, e.g., a sound card, a vibrating device, a scanner, a printer, a wireless communication module, or any other component for capturing information related to the user or the user's environment. - The
memory 124 includes any quantity of memory associated with or accessible by thecomputing device 100. Thememory 124 may be internal to theclient device 100, as shown inFIG. 1 , external to theclient device 100, not shown in FIG.1, or both. Examples ofmemory 124 may include, without limitation, random access memory (RAM), read only memory (ROM), electronically erasable programmable read only memory (EEPROM), flash memory or other memory technologies, CDROM, digital versatile disks (DVDs) or other optical or holographic media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, memory wired into an analog computing device, or any other medium for encoding desired information and for access by theclient device 100. The terms computer-readable medium, machine readable medium, and storage device do not include carrier waves to the extent carrier waves are deemed too transitory.Memory 124 may also take the form of volatile and/or nonvolatile memory; may be removable, non-removable, or a combination thereof; and may include various hardware devices, e.g., solid-state memory, hard drives, optical-disc drives, etc. Additionally, or alternatively, thememory 124 may be distributed acrossmultiple client devices 100, e.g., in a virtualized environment in which instruction processing is carried out onmultiple client devices 100. Thememory 124 stores, among other data, various device applications that, when executed by theprocessor 106, operate to perform functionality on thecomputing device 100. Examples of applications may include search applications, instant messaging applications, electronic-mail application programs, web browsers, calendar application programs, address book application programs, messaging programs, media applications, location-based services, search programs, and the like. The applications may communicate with counterpart applications or services such as web services accessible via thenetwork 104. For example, the applications may include client-operating applications that correspond to server-side applications executing on remote servers or computing devices in the cloud. - Specifically, instructions stored in
memory 124 comprise a communications interface application 126, a user interface application 128, and a consumable item identification application 130. In some examples, the communications interface application 126 includes computer-executable instructions for operating a network interface card and/or a driver for operating the network interface card. Communication between theclient device 100 and other devices may occur using any protocol or mechanism over a wired or wireless connection, or across thenetwork 104. In some examples, the communications interface application 126 is operable with RF and short-range communication technologies using electronic tags, such as NFC tags, Bluetooth® brand tags, or the like. - In some examples, the user interface application 128 includes a graphics application for displaying data to the user and receiving data from the user. The user interface application 128 may also include computer-executable instructions for operating the graphics card to display search results and corresponding images or speech on or through the
presentation components 110. The user interface application 128 may also interact with thevarious sensors 118 andcamera 120 to both capture and present information through thepresentation components 110. -
FIG. 8 is a block diagram of anetworking environment 800 for providing image search results and recommendations on client device(s) 100. Thenetworking environment 800 may includemultiple client devices 100, or just one as shown inFIG. 2 , a consumable itemidentification engine server 202, and adatabase 224 communicating over anetwork 104. In some examples, user and environment data are communicated by theclient devices 100 over thenetwork 104 to the consumable itemidentification engine server 202, and the consumable itemidentification engine server 202 generates tailored image search results that are provided back to theclient devices 100 for presentation to theirrespective users 102. - The
networking environment 800 shown inFIG. 8 is merely an example of one suitable computing system environment and is not intended to suggest any limitation as to the scope of use or functionality of examples disclosed herein. Neither should the illustratednetworking environment 800 be interpreted as having any dependency or requirement related to any single component, module, index, or combination thereof. - The
network 104 may include any computer network, for example the Internet, a private network, local area network (LAN), wide area network (WAN), or the like. Thenetwork 104 may include various network interfaces, adapters, modems, and other networking devices for communicatively connecting theclient devices 100, the consumable itemidentification engine server 202, and thedatabase 224. Thenetwork 104 may also include configurations for point-to-point connections. Computer networks are well known to one skilled in the art, and therefore do not need to be discussed in detail herein. - The
client devices 100 may be any type of client device discussed above in reference toFIG. 1 . Eachclient device 100 may capture user and/or environment data from their respective users and communicate the captured user and environment data over thenetwork 104 to the consumable itemidentification engine server 202 and/or thedatabase 224. To do so, each device may be equipped with a communications interface application 126, as discussed above in reference toFIG. 1 . In response, the consumable itemidentification engine server 202 may be capable of providing intelligent image search results including restaurant, recipe, and ingredient recommendations tomany client devices 100 capable of communicating their respectively captured user and environment data over thenetwork 104. Put another way, the consumable itemidentification engine server 202 may control image searches onmany client devices 100. - The
client device 100 may be equipped with various software applications andpresentation components 110 for receiving image-based search queries and presenting recommended restaurants to their respective users. For example, a mobile phone may present a search bar or search screen for image search. The search bar/screen can receive a user selected image of various image file types. The image is processed through various systems of identifying content in images. The search service may then provide an ordered list of restaurants that provide (e.g., serve) the same or similar food items identified in the user selected image. The illustrated client devices and the aforesaid presentation mechanisms are not an exhaustive list covering all examples. Many different variations ofclient devices 100 and presentation techniques may be used to deliver the image search result responses to users. - The consumable item
identification engine server 202 represents a server or collection of servers configured to execute different web-service computer-executable instructions. The consumable itemidentification engine server 202 includes aprocessor 204 to process executable instructions, atransceiver 206 to communicate over thenetwork 104, and amemory 208 embodied with at least the following executable instructions: animage processing module 210, arecommendation generation module 220, and alearning module 222. Thememory 208 may include instructions forcontent identification module 212, arestaurant module 214, arecipe module 216, and alocation module 218. While consumable itemidentification engine server 202 is illustrated as a single box, one skilled in the art will appreciate that the consumable itemidentification engine server 202 may, in fact, be scalable. For example, the consumable itemidentification engine server 202 may include multiple servers operating various portions of software that collectively generate image search results on theclient devices 100. - The
database 224 provides backend storage of Web, user, and environment data that may be accessed over thenetwork 104 by the consumable itemidentification engine server 202 or theclient devices 100 and used by the consumable itemidentification engine server 202 to generate search results. The Web, user, and environment data stored in the database includes, for example but without limitation, user profiles 226,search context 228 and so on. Additionally, though not shown for the sake of clarity, the servers of thedatabase 224 may include their own processors, transceivers, and memory. Also,networking environment 800 depicts thedatabase 224 as a collection of separate devices from the consumable itemidentification engine server 202; however, examples may actually store the discussed Web, user, and environment data shown in thedatabase 224 on the consumable itemidentification engine server 202. - Herein, a “user profile” refers to an electronically stored collection of information related to the user. Such information may include the user's name, age, gender, height, weight, demographics, current location, residency, citizenship, family, friends, schooling, occupation, hobbies, skills, interests, Web searches, health information, birthday, anniversary, celebrated holidays, moods, user's condition, and any other personalized information associated with the user. The user profile includes static profile elements, e.g., name, birthplace, etc., and dynamic profile elements that change over time, e.g., residency, age, condition, etc. The user profile may be built through probing questions to the user or through analyzing the user's behavior on one or more client devices.
- Additionally, user profiles 226 may include static and/or dynamic data parameters for individual users. Examples of user profile data include, without limitation, a user's age, gender, race, name, location, interests, Web search history, social media connections and interactions, purchase history, routine behavior, jobs, or virtually any unique data points specific to the user. The user profiles 226 may be expanded to encompass various other aspects of a user's life.
- During operation, a user employs
client device 100 to perform an image search. In some embodiments, the client device sends a request to a server system (e.g., consumable item identification engine server 202), rather than performing the search itself. For example, a user sees a photograph of a food item on social media. However, the photograph does not have any accompanying information about what it is, where it was from, what ingredients are included, etc. The user is curious and wants to find out more about the food in the photograph and sends the image as a search query to a search service provider (e.g., database such as Microsoft Bing). Various pre-processing techniques may be applied to the image including cropping, de-skewing, normalizing, etc. before an image embedding is created. - An embedding is a mapping of a discrete categorical variable to a vector of continuous numbers. In the context of neural networks, embeddings are low or high dimensional, learned continuous vector representations of discrete variables, and the vector representations typically range between 20-4000, but may fall outside of the given range as appropriate. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space and can be used to find nearest neighbors in the embedding space. The nearest neighbors can be used to make recommendations based on user interests or cluster categories.
- The image embedding is then analyzed through one or more convolutional neural networks to obtain k-nearest neighbor images in a trained embedding space. The k-nearest neighbor images are then analyzed by reviewing each image's label to create a curated set of neighbor images with matching (or close to matching) labels. In some embodiments, the user is curious about the food in the photograph and wants to order the food at a local restaurant. In such scenarios, the server obtains the user's location and adds the local restaurant location information into the search criteria. The returned list of the images is from restaurants within a certain distance of the user.
-
-
- 1. A method includes receiving an image from a user, the image including a consumable item, determining a location of the user, identifying a plurality of restaurants within a predetermined distance of the user, obtaining a plurality of consumable item images associated with the plurality of restaurants, using a neural network, identifying, from the plurality of consumable item images, one or more images with similar consumable items, including calculating a similarity score for each identified image, wherein the one or more images with similar consumable items include one or more ingredients of the consumable item in the image, identifying the restaurant associated with each of the identified images, ranking the restaurants based on the similarity score, and providing a list of ranked restaurants to the user.
- 2. The method of example 1 wherein the consumable item in an entrée offered at a restaurant.
- 3. The method of any of examples 1-2, further including performing image pre-processing including cropping, de-skewing, and resizing.
- 4. The method of any of examples 1-3, further comprises passing the image through one or more convolutional neural networks.
- 5. The method of any of examples 1-4, where the consumable item includes one or more identifiable ingredients and the one or more ingredients that are identified are the one or more identifiable ingredients.
- 6. The method of any of examples 1-5, where the restaurants are ranked at least in part on the location of the user and a distance from the location of the user to the restaurant.
- 7. The method of any of examples 1-6, where the restaurants are ranked at least in part on each restaurant's hours of service, take-out options, number of reviews, and/or score of review.
- 8. The method of any of examples 1-7, where the restaurants are ranked based at least in part on an analysis of a number of matches of the one or more ingredients of the consumable item in the image to one or more ingredients of the one or more similar images.
- 9. The method of any of examples 1-8, where providing a list of restaurants further includes ranking the one or more restaurants in descending order based on the similarity score and including a name of the consumable item with the list of restaurants.
- 10. A system, comprising a processor and a memory device coupled to the process and having a program stored thereon for execution by the processor to perform operations comprising receiving an image from a user, the image including a consumable item. The operations further include determining a location of the user, identifying a plurality of restaurants within a predetermined distance of the user, and obtaining a plurality of consumable item images associated with the plurality of restaurants. The operations further include using a neural network for identifying, from the plurality of consumable item images, one or more images with similar consumable items including calculating a similarity score for each identified image. The one or more similar images include the one or more ingredients of the consumable item in the image. The operations further include identifying the restaurant associated with each of the identified images, ranking the one or more restaurants based on the similarity score, and providing a list of ranked restaurants to the user.
- 11. The system of example 10, where the consumable item is an entrée offered at the restaurant
- 12. The system of any of examples 10-11, further including performing image pre-processing to obtain an image embedding.
- 13. The system of any of examples 10-12 where performing image pre-processing comprise passing the image through one or more convolutional neural networks.
- 14. The system of any of examples 10-13, where the consumable item includes one or more identifiable ingredients, the one or more ingredients that are identified are the one or more identifiable ingredients.
- 15. The system of any of examples 10-14, where the restaurants are ranked at least in part on the location of the user and a distance from the location of the user to the restaurant.
- 16. The system of any of examples 10-15 where the restaurants are ranked similarity score is based at least in part on each restaurant's hours of service, take-out options, number of reviews, and/or score of review.
- 17. The system of any of examples 10-16 where the restaurants are ranked at least in part on an analysis of a number of matches of the one or more contents of the consumable item in the image to one or more contents of the one or more similar images.
- 18. The system of any of examples 10-17, where providing a list of restaurants further comprises ranking the one or more restaurants in descending order based on the similarity score and including a name of the consumable item with the list of restaurants.
- 19. A computer-readable storage medium stores one or more programs, the one or more programs including instructions which when executed by one or more processors of an electronic device, cause the electronic device to perform operations to perform a method. The operations include receiving an image from a user, the image including a consumable item, determining a location of the user, identify a plurality of restaurants within a predetermined distance of the user, obtain a plurality of consumable item images associated with the plurality of restaurants, using a neural network, identifying one or more images with similar consumable items from the plurality of consumable item images and calculating a similarity score for each identified image. The one or more similar consumable items include the one or more ingredients of the consumable item in the image. The operations further include identifying a restaurant associated with each of the identified images, ranking the one or more restaurants based on the similarity score, and providing a list of ranked restaurants to the user.
- 20. The computer readable storage medium of example 19, where providing the ranked list of restaurants further comprises providing a list of the one or more identified contents of the consumable item in the image.
- Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.
- In the description herein, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the present invention. The following description of example embodiments is, therefore, not to be taken in a limited sense, and the scope of the present invention is defined by the appended claims.
- The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware-based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.
- The functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like. For example, the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality. The phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software. The term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware. The term, “logic” encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation. An operation can be performed using, software, hardware, firmware, or the like. The terms, “component,” “system,” and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof. A component may be a process running on a processor, an object, an executable, a program, a function, a subroutine, a computer, or a combination of software and hardware. The term, “processor,” may refer to a hardware component, such as a processing unit of a computer system.
- Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter. The term, “article of manufacture,” as used herein is intended to encompass a computer program accessible from any computer-readable storage device or media. Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others. In contrast, computer-readable media, i.e., not storage media, may additionally include communication media such as transmission media for wireless signals and the like.
Claims (20)
1. A method of ranking restaurants at a computing device, comprising:
receiving, at the computing device, an image from a client device coupled to the computing device over a network, the image including a consumable food item;
pre-processing the received image to obtain an image embedding of the received image and determine the contents of the received image, including the consumable food item;
determining a location of the client device;
identifying a plurality of restaurants within a predetermined distance of the client device;
obtaining a plurality of consumable food item images associated with the plurality of restaurants;
using a neural network at the computing device, identifying, from the plurality of consumable food item images, one or more images with similar consumable food items to the consumable food item in the image embedding, wherein identifying the one or more images includes calculating a similarity score for each of the identified one or more images;
identifying the restaurant associated with each of the identified one or more images and the name of the consumable food item in each of the identified one or more images at each respective identified restaurant;
ranking the identified restaurants based on aspects for each respective identified restaurant including the calculated similarity score for each of the identified one or more images and at least one of a distance from the location of the client device to the respective identified restaurant or a score of reviews of the respective identified restaurants, wherein a weight of the different aspects of the ranking are determined based at least in part on information from a user profile associated with the client device; and
causing a list of ranked restaurants to be displayed at a user interface of the client device and the name of the consumable food items at the list of ranked restaurants.
2. The method of claim 1 , wherein pre-processing the received image includes passing the received image through one or more neural networks to obtain the image embedding of the received image.
3. The method of claim 1 , wherein the consumable food item is an entrée offered at a restaurant, and
wherein the aspects of the ranking further include hours of service, take-out options, and number of reviews.
4. The method of claim 1 , wherein pre-processing the received image further includes cropping, de-skewing, and resizing the received image,
wherein the image embedding comprises a mapping of a discrete categorical variable to a vector of continuous numbers to reduce dimensionality of categorical variables, and
wherein identifying the one or more images with similar consumable food items to the consumable food item in the image embedding comprises identifying one or more nearest neighbors in an embedding space and providing k-nearest neighbor recommendations.
5. The method of claim 1 , wherein the neural network at the computing device is trained to recognize ingredients,
wherein identifying, from the plurality of consumable food item images, one or more images with similar consumable food items includes identifying one or more images with similar ingredients of the consumable food item in the image, and
wherein the consumable food item includes one or more identifiable ingredients, the one or more ingredients that are identified are the one or more identifiable ingredients.
6. The method of claim 1 , wherein the neural network at the computing device is trained to recognize ingredients,
wherein identifying, from the plurality of consumable food item images, one or more images with similar consumable food items includes identifying one or more images with similar ingredients of the consumable food item in the image, and
wherein ranking the identified restaurants are further based at least in part on an analysis of a number of matches of the one or more ingredients of the consumable food item in the image to one or more ingredients of the one or more similar images.
7. The method of claim 1 , wherein providing a list of restaurants further comprises ranking the one or more restaurants in descending order based on the aspects and including the name of the consumable food item with the list of ranked restaurants.
8. The method of claim 1 , wherein a weight of the score of reviews is greater than the distance from the location of the client device to the respective identified restaurant based on a parameter from the user profile of a user associated with the client device.
9. The method of claim 1 , wherein the predetermined distance is determined at least in part by information from the user profile associated with the client device.
10. A system, comprising:
a processor; and
a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations comprising:
receiving an image from a client device, the image including a consumable food item;
pre-processing the received image to obtain an image embedding of the received image and determine the contents of the received image, including the consumable food item;
determining a location of the client device;
identifying a plurality of restaurants within a predetermined distance of the client device;
obtaining a plurality of consumable food item images associated with the plurality of restaurants;
using a neural network, identifying, from the plurality of consumable food item images, one or more images with similar consumable food items to the consumable food item in the image embedding, wherein identifying the one or more images includes calculating a similarity score for each of the identified one or more images;
identifying the restaurant associated with each of the identified one or more images and the name of the consumable food item in each of the identified one or more images at each respective identified restaurant;
ranking the identified restaurants based on aspects for each respective identified restaurant including the calculated similarity score for each of the identified one or more images and at least one of a distance from the location of the client device to the respective identified restaurant or a score of reviews of the respective identified restaurants, wherein a weight of the different aspects of the ranking are determined based at least in part on information from a user profile associated with the client device; and
causing a list of ranked restaurants to be displayed at a user interface of the client device and the name of the consumable food items at the list of ranked restaurants.
11. The system of claim 10 , wherein pre-processing the received image includes passing the received image through one or more neural networks to obtain the image embedding of the received image.
12. The system of claim 10 , wherein the consumable food item is an entrée offered at a restaurant, and
wherein the aspects of the ranking further include hours of service, take-out options, and number of reviews.
13. The system of claim 10 , wherein pre-processing the received image further includes cropping, de-skewing, and resizing the received image,
wherein the image embedding comprises a mapping of a discrete categorical variable to a vector of continuous numbers to reduce dimensionality of categorical variables, and
wherein identifying the one or more images with similar consumable food items to the consumable food item in the image embedding comprises identifying one or more nearest neighbors in an embedding space and providing k-nearest neighbor recommendations.
14. The system of claim 10 , wherein the neural network is trained to recognize ingredients,
wherein identifying, from the plurality of consumable food item images, one or more images with similar consumable food items includes identifying one or more images with similar ingredients of the consumable food item in the image, and
wherein the consumable food item includes one or more identifiable ingredients, the one or more ingredients that are identified are the one or more identifiable ingredients.
15. The system of claim 10 , wherein the neural network is trained to recognize ingredients,
wherein identifying, from the plurality of consumable food item images, one or more images with similar consumable food items includes identifying one or more images with similar ingredients of the consumable food item in the image, and
wherein ranking the identified restaurants are further at least in part on an analysis of a number of matches of the one or more ingredients of the consumable food item in the image to one or more contents of the one or more similar images.
16. The system of claim 10 , wherein providing a list of restaurants further comprises ranking the one or more restaurants in descending order based on the aspects and including the name of the consumable food item with the list of ranked restaurants.
17. The system of claim 10 , wherein a weight of the score of reviews is greater than the distance from the location of the client device to the respective identified restaurant based on a parameter from the user profile of a user associated with the client device.
18. The system of claim 10 , wherein the predetermined distance is determined at least in part by information from the user profile associated with the client device.
19. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions which when executed by one or more processors of an electronic device, cause the electronic device to perform operations, comprising:
receiving an image from a client device, the image including a consumable food item;
pre-processing the received image to obtain an image embedding of the received image and determine the contents of the received image, including the consumable food item;
determining a location of the client device;
identifying a plurality of restaurants within a predetermined distance of the client device;
obtaining a plurality of consumable food item images associated with the plurality of restaurants;
using a neural network, identifying, from the plurality of consumable food item images, one or more images with similar consumable food items to the consumable food item in the image embedding, wherein identifying the one or more images includes calculating a similarity score for each of the identified one or more images;
identifying the restaurant associated with each of the identified one or more images and the name of the consumable food item in each of the identified one or more images at each respective identified restaurant;
ranking the identified restaurants based on aspects for each respective identified restaurant including the calculated similarity score for each of the identified one or more images and at least one of a distance from the location of the client device to the respective identified restaurant or a score of reviews of the respective identified restaurants, wherein a weight of the different aspects of the ranking are determined based at least in part on information from a user profile associated with the client device; and
causing a list of ranked restaurants to be displayed at a user interface of the client device and the name of the consumable food items at the list of ranked restaurants.
20. The computer readable storage medium of claim 19 , wherein pre-processing the received image includes passing the received image through one or more neural networks to obtain the image embedding of the received image and to determine the contents of the received image, including ingredients of the consumable food item,
wherein identifying, from the plurality of consumable food item images, one or more images with similar consumable food items includes identifying one or more images with similar ingredients of the consumable food item in the image, and
wherein the restaurants are ranked based at least in part on an analysis of a number of matches of the one or more identified ingredients of the consumable food item in the image to one or more ingredients of the one or more similar images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/382,230 US20240046332A1 (en) | 2020-11-23 | 2023-10-20 | Providing Local Recommendations based on Images of Consumable Items |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/102,009 US11830056B2 (en) | 2020-11-23 | 2020-11-23 | Providing local recommendations based on images of consumable items |
US18/382,230 US20240046332A1 (en) | 2020-11-23 | 2023-10-20 | Providing Local Recommendations based on Images of Consumable Items |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/102,009 Continuation US11830056B2 (en) | 2020-11-23 | 2020-11-23 | Providing local recommendations based on images of consumable items |
Publications (1)
Publication Number | Publication Date |
---|---|
US20240046332A1 true US20240046332A1 (en) | 2024-02-08 |
Family
ID=81658352
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/102,009 Active 2040-12-16 US11830056B2 (en) | 2020-11-23 | 2020-11-23 | Providing local recommendations based on images of consumable items |
US18/382,230 Pending US20240046332A1 (en) | 2020-11-23 | 2023-10-20 | Providing Local Recommendations based on Images of Consumable Items |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/102,009 Active 2040-12-16 US11830056B2 (en) | 2020-11-23 | 2020-11-23 | Providing local recommendations based on images of consumable items |
Country Status (1)
Country | Link |
---|---|
US (2) | US11830056B2 (en) |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05342191A (en) * | 1992-06-08 | 1993-12-24 | Mitsubishi Electric Corp | System for predicting and analyzing economic time sequential data |
US6728695B1 (en) * | 2000-05-26 | 2004-04-27 | Burning Glass Technologies, Llc | Method and apparatus for making predictions about entities represented in documents |
US8457366B2 (en) * | 2008-12-12 | 2013-06-04 | At&T Intellectual Property I, L.P. | System and method for matching faces |
US8897560B2 (en) * | 2012-09-24 | 2014-11-25 | Eastman Kodak Company | Determining the estimated clutter of digital images |
US20140104385A1 (en) * | 2012-10-16 | 2014-04-17 | Sony Network Entertainment International Llc | Method and apparatus for determining information associated with a food product |
US9916520B2 (en) * | 2014-09-03 | 2018-03-13 | Sri International | Automated food recognition and nutritional estimation with a personal mobile electronic device |
US10832302B2 (en) * | 2015-10-30 | 2020-11-10 | Forq, Inc. | Method for digital recipe library and food image recognition services |
CN111295669A (en) * | 2017-06-16 | 2020-06-16 | 马克波尔公司 | Image processing system |
US11727458B2 (en) * | 2018-11-29 | 2023-08-15 | Cut And Dry Inc. | Produce comparison system |
US11275959B2 (en) * | 2020-07-07 | 2022-03-15 | Assa Abloy Ab | Systems and methods for enrollment in a multispectral stereo facial recognition system |
-
2020
- 2020-11-23 US US17/102,009 patent/US11830056B2/en active Active
-
2023
- 2023-10-20 US US18/382,230 patent/US20240046332A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US11830056B2 (en) | 2023-11-28 |
US20220164853A1 (en) | 2022-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11669557B2 (en) | Iterative image search algorithm informed by continuous human-machine input feedback | |
US20230042931A1 (en) | Menu Personalization | |
US20240029131A1 (en) | Digital recipe library and network with food image recognition services | |
US9940661B2 (en) | Method and apparatus for recommending a food item | |
US20170039283A1 (en) | Searching Based on the Persona of Another | |
US20190102395A1 (en) | Personalizing search and/or recommendation results based on member activity models | |
JP2023162232A (en) | Intelligent systems and methods for visual search queries | |
KR102353420B1 (en) | Crowd assisted query system | |
US11830056B2 (en) | Providing local recommendations based on images of consumable items | |
US11663683B2 (en) | System and method for image-based food item, search, design, and culinary fulfillment | |
US11500926B2 (en) | Cascaded multi-tier visual search system | |
US11740768B2 (en) | Methods and systems for providing alimentary elements | |
US20230400962A1 (en) | Methods and systems for providing alimentary elements | |
LU102575B1 (en) | Providing recent event information for web search queries | |
Aizawa | FoodLog: Multimedia Food Recording Tools for Diverse Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GONG, JULIA X.;PATEL, JYOTKUMAR;SONG, YALE;AND OTHERS;SIGNING DATES FROM 20201216 TO 20210427;REEL/FRAME:065527/0552 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |