128x Filetype PDF File size 1.49 MB Source: www.michelemerler.com
Michele Merler, Hui Wu, Rosario Uceda-Sosa Quoc-Bao Nguyen, John R. Smith nd 2 International Workshop on Multimedia IBM TJ Watson Research Center Assisted Dietary Management @ACM MM 2016 mimerler@us.ibm.com snap eat repEat a Food Recognition Engine for Dietary Logging Motivation Visual Recognition – Food vs Not-Food • Exercise, sleep and nutrition monitoring is essential for optimizing athletic performance Model: GoogleNet pretrained on Imagenet and • Need to reduce friction (manual, inaccurate) to make nutrition monitoring fast and easy finetuned on a dataset with 2.M training and Food vs NotFood classifier ROC curve on UNI-CT test • Visual food recognition greatly simplifies logging of meals using context and content 660K test images • Provides accurate tracking of diet and planning nutritional intake for achieving goals One-Class SVM Binary Binary Fine- Method [Farinella et al. Ensemble Tuned Performance MaDiMa15] SVM GoogleNet Exercise Food889 True 0.6543 0.8685 0.9711 Sleep Positives Rate Nutrition Flickr Food True 0.4300 0.6744 0.9417 Positives Rate History Logging Planning Flickr No-Food True 0.9444 0.9589 0.9817 Context: Food matching: Negative Rate • Geo-Location Unknown • Fast, accurate Overall Accuracy 0.9202 0.9513 0.9808 • Time of day Photo • Multi-modal 660K Test Set - 0.8877 0.9895 • Restaurant name • Scalable • Historical meals Food Food database: Content: Nutrition logging: Match & • Food photos • At Home Nutrition • Photo • Restaurants • Nutrition info Food Recognition in Context • Text • Meals away Info • Menus • Interaction Food Visual Recognition • User data • K-NN: based on fc7 features from AlexNet • AlexNet: finetuned on restaurant chain training set System Architecture • GoogLeNet: finetuned on Restaurant chains training set • GoogLeNetFood: two finetuning steps, first n subset of Food vs Not-food dataset, then Snap Meal Photos Restaurant chains training set TOP 1 Accuracy Restaurant #Classes # Images 1 1 In Context 0.9 Not enough pics, restaurant, ST API Food Visual Recognition and Analysis Applebee's 50 405 0.8 training data menu 0.7 RE Au Bon Pain 43 146 0.6 2 In-the-wild Denny's 56 325 0.5 Just pics 0.4 Olive Garden 55 457 0.3 Panera Bread 79 2,267 0.2 Contextual Data Food Semantic Visual Models 0.1 Nutrition Logging, Dietary Assistant TGI Fridays 54 432 0 (location, menu) Hierarchy Restau- Restau- Wild AppleBees(8.1) AuBonPain(3.4) Dennys(5.8) OliveGarden(8.3) PaneraBread(28.7) TGIFridays(8) rant 1 rant N Restaurant Chain (number of images per item) Recognized food category Food Recognition “in the wild” Nutrition Nutritional info Food Images Web and Social Unnecessary images removal Filter and rank by classifier Crowdsourced human information Database Database Media Crawling (Food vs. not Food) verifications •Duplicates Food •Empty images Client side Server side •Small images “bacon” Not-Food Created Largest Visual Food Recognition Dataset Model: GoogleNet pretrained on Imagenet Visual Interface Dataset Numberof Number of Number of Food and finetuned on given dataset Classes Images/Class Images Ontology M UEC Food 256 [22] 256 89 31,651 None Dataset Accuracy (top 1) IB-Geolocalized [40] 3,852 30 117,504 None Food 101 [Martinel ICCV15] 79 T Food-101 [7] 101 1000 101,100 None NO ETHZ Food 101 [37] 101 1000 101,100 None Food 101 (ours) 69.64 M Food 500 508 290 148,408 Yes IB Food 3,000 (ongoing) 3000 500 1.5M Yes Food 500(ours) 40.37 VS VS Creole rice Jambalaya Roast beef Pastrami t Confused Categories VS VS Mos Beef vindaloo Rogan josh Peanut butter Fudge
no reviews yet
Please Login to review.