Real-time face distance and iris track on mobile phone without depth sensor
MediaPipe Iris: Real-time Iris Tracking & Depth Estimation
Google AI Blog https://ai.googleblog.com/2020/08/mediapipe-iris-real-time-iris-tracking.html
MediaPipe on the Web https://developers.googleblog.com/2020/01/mediapipe-on-web.html
arXiv paper abstract https://arxiv.org/abs/2006.11341
arXiv PDF paper https://arxiv.org/pdf/2006.11341.pdf
A wide range of real-world applications … rely on estimating eye position by tracking the iris.
… show that it is possible to determine the metric distance from the camera to the user — without the use of a dedicated depth sensor.
… announce the release of MediaPipe Iris … able to track landmarks involving the iris, pupil and the eye…
Better 3D pose estimates in video by dynamically learning joint relationships
Graph Convolution Network (GCN) … for 3D human pose estimation in videos. … built on the fixed human-joint affinity …
may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations
… propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos.
… discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal…
Unsupervised learning of image classes from dynamic video stream
Real world learning scenarios involve a nonstationary distribution of classes … demand learning on-the-fly from few or no class labels.
… propose an unsupervised model that simultaneously performs online visual representation learning and few-shot learning of new categories without relying on any class labels.
… model … determines when to form a new class prototype. … formulate … online Gaussian mixture model
… includes a contrastive loss that encourages different views of the same image…
Real-time 3D hand reconstruction from a single monocular image
3D hand-mesh reconstruction from RGB images facilitates many applications, including augmented reality (AR).
However, this requires not only real-time speed and accurate hand pose and shape but also plausible mesh-image alignment.
… decoupling the hand-mesh reconstruction task into three stages:
a joint stage to predict hand joints and segmentation;
a mesh stage to predict a rough hand mesh; and
a refine stage to fine-tune it with an offset mesh for mesh-image alignment.
… can promote…
Get depth, regions, and layout from panoramic image quickly and accurately with horizontal features
HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features
arXiv paper abstract https://arxiv.org/abs/2011.11498
arXiv PDF paper https://arxiv.org/pdf/2011.11498.pdf
YouTube (5 min) https://www.youtube.com/watch?v=xXtRaRKmMpA
We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat).
The compact LHFeat flattens the features along the vertical direction and has shown success in modeling per-column modality for room layout reconstruction.
… allowing per-pixel dense prediction from LHFeat.
HoHoNet is fast: It runs at 52 FPS and 110 FPS with…
Use satellite images to get 3D structure of buildings and roofs
Automated LoD-2 Model Reconstruction from Very-HighResolution Satellite-derived Digital Surface Model and Orthophoto
arXiv paper abstract https://arxiv.org/abs/2109.03876
arXiv PDF paper https://arxiv.org/pdf/2109.03876.pdf
… reconstructs LoD-2 building models following a “decomposition-optimization-fitting” paradigm.
… starts … through a deep learning-based detector and vectorizes individual segments into polygons
… decomposes the complex and irregularly shaped building polygons to tightly combined elementary building rectangles
… introduced OpenStreetMap (OSM) and Graph-Cut (GC) labeling to further refine the orientation of 2D building rectangle.
… takes building-specific parameters such as hip lines … to optimize the flexibility for…
Image classification without normalization that is faster and better than with normalization
High-Performance Large-Scale Image Recognition Without Normalization
arXiv paper abstract https://arxiv.org/abs/2102.06171
arXiv PDF paper https://arxiv.org/pdf/2102.06171.pdf
Papers With Code https://paperswithcode.com/paper/high-performance-large-scale-image
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.
… a significantly improved class of Normalizer-Free ResNets.
… smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and
… largest models attain a new state-of-the-art top-1 accuracy of…
Using an audio and vision transformer to count crowds
Crowd estimation is a very challenging problem.
… address the critical challenges in crowd counting by effectively utilizing both visual and audio inputs
… introduces the notion of auxiliary and explicit image patch-importance ranking (PIR) and patch-wise crowd estimate (PCE) information to produce a third (run-time) modality.
These modalities (audio, visual, run-time) undergo a transformer-inspired cross-modality co-attention mechanism to finally output the crowd estimate.
… proposed scheme outperforms the state-of-the-art networks under all evaluation settings with up to 33.8% improvement.
We also analyze and compare the vision-only variant of our network and empirically demonstrate its superiority over previous approaches.
Survey on improving efficiency of computer vision recogntion using deep learning
Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions
arXiv paper abstract https://arxiv.org/abs/2108.13055v1
arXiv PDF paper https://arxiv.org/pdf/2108.13055v1.pdf
Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence.
… Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks
… Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical
… present the review of the…