Put partial 3D point clouds into standard orientation with self-supervised ConDor

ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes
arXiv paper abstract https://arxiv.org/abs/2201.07788
arXiv PDF paper https://arxiv.org/pdf/2201.07788.pdf
Project page https://ivl.cs.brown.edu/ConDor

Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and…

Improve pedestrian detection by using general object detector with Cascade RCNN

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond
arXiv paper abstract https://arxiv.org/abs/2201.03176v1
arXiv PDF paper https://arxiv.org/pdf/2201.03176v1.pdf
GitHub https://github.com/hasanirtiza/Pedestron

… current pedestrian detectors poorly handle even small domain shifts in cross-dataset evaluation.

… attribute the limited generalization to two main…

Enhance dim images better and simpler using imperfectly aligned images with CIDN

Enhancing Low-Light Images in Real World via Cross-Image Disentanglement
arXiv paper abstract https://arxiv.org/abs/2201.03145v1
arXiv PDF paper https://arxiv.org/pdf/2201.03145v1.pdf

Images captured in the low-light … suffer from low visibility and … artifacts, e.g., real noise.

Existing supervised enlightening algorithms require…

Better depth and motion from thermal images by improving self-supervised learning

Maximizing Self-supervision from Thermal Image for Effective Self-supervised Learning of Depth and Ego-motion
arXiv paper abstract https://arxiv.org/abs/2201.04387v1
arXiv PDF paper https://arxiv.org/pdf/2201.04387v1.pdf

… self-supervised learning of depth and ego-motion from thermal images shows strong robustness and reliability under challenging scenarios.

Identify better the events and participants in an image with CLIP-Event

CLIP-Event: Connecting Text and Images with Event Structures
arXiv paper abstract https://arxiv.org/abs/2201.05078
arXiv PDF paper https://arxiv.org/pdf/2201.05078.pdf

… vision-language pretraining models primarily focus on understanding objects in images or entities in text, they often ignore the alignment at the level of events and their argument structures.

… propose a contrastive learning framework to enforce vision-language pretraining models to comprehend events and associated argument (participant) roles.

… take advantage of text information extraction technologies to obtain event structural knowledge, and utilize multiple prompt functions to contrast difficult negative descriptions by manipulating event structures.

… zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction on Multimedia Event Extraction …

Detect known and also unknown objects which are later labeled and added without forgetting

Revisiting Open World Object Detection
arXiv paper abstract https://arxiv.org/abs/2201.00471v2
arXiv PDF paper https://arxiv.org/pdf/2201.00471v2.pdf
GitHub https://github.com/re-owod/re-owod

Open World Object Detection (OWOD), simulating the real dynamic world where knowledge grows continuously, attempts to detect both known and unknown…

Get eye gaze direction using low-cost camera with edge device

Resolving Camera Position for a Practical Application of Gaze Estimation on Edge Devices
arXiv paper abstract https://arxiv.org/abs/2201.02946v1
arXiv PDF paper https://arxiv.org/pdf/2201.02946v1.pdf
GitHub https://github.com/linh-gist/gazeestimationtx2

Most Gaze estimation research only works on a setup condition that a camera perfectly captures eyes gaze.

