Unknown object detection by training with phrase and region pairs with GLIP

--

Unknown object detection by training with phrase and region pairs with GLIP

Grounded Language-Image Pre-training
arXiv paper abstract https://arxiv.org/abs/2112.03857
arXiv PDF paper https://arxiv.org/pdf/2112.03857.pdf
GitHub https://github.com/microsoft/GLIP
Demo https://colab.research.google.com/drive/12x7v-_miN7-SRiziK3Cx4ffJzstBJNqb?usp=sharing

… presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations.

GLIP unifies object detection and phrase grounding for pre-training.

… GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model … GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

… pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs.

The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks.

1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Andrew Palmer on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet