Unknown object detection by training with phrase and region pairs with GLIP
Unknown object detection by training with phrase and region pairs with GLIP
Grounded Language-Image Pre-training
arXiv paper abstract https://arxiv.org/abs/2112.03857
arXiv PDF paper https://arxiv.org/pdf/2112.03857.pdf
GitHub https://github.com/microsoft/GLIP
Demo https://colab.research.google.com/drive/12x7v-_miN7-SRiziK3Cx4ffJzstBJNqb?usp=sharing
… presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations.
GLIP unifies object detection and phrase grounding for pre-training.
… GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model … GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.
… pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs.
The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks.
1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b