Real-time face distance and iris track on mobile phone without depth sensor

MediaPipe Iris: Real-time Iris Tracking & Depth Estimation
Google AI Blog
MediaPipe on the Web
arXiv paper abstract
arXiv PDF paper

A wide range of real-world applications … rely on estimating eye position by tracking the iris.

… show that it is possible to determine the metric distance from the camera to the user — without the use of a dedicated depth sensor.

… announce the release of MediaPipe Iris … able to track landmarks involving the iris, pupil and the eye…

Better 3D pose estimates in video by dynamically learning joint relationships

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos
arXiv paper abstract
arXiv PDF paper

Graph Convolution Network (GCN) … for 3D human pose estimation in videos. … built on the fixed human-joint affinity …

may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations

… propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos.

… discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal…

Unsupervised learning of image classes from dynamic video stream

Online Unsupervised Learning of Visual Representations and Categories
arXiv paper abstract
arXiv PDF paper

Real world learning scenarios involve a nonstationary distribution of classes … demand learning on-the-fly from few or no class labels.

… propose an unsupervised model that simultaneously performs online visual representation learning and few-shot learning of new categories without relying on any class labels.

… model … determines when to form a new class prototype. … formulate … online Gaussian mixture model

… includes a contrastive loss that encourages different views of the same image…

Real-time 3D hand reconstruction from a single monocular image

Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction
arXiv paper abstract
arXiv PDF paper

3D hand-mesh reconstruction from RGB images facilitates many applications, including augmented reality (AR).

However, this requires not only real-time speed and accurate hand pose and shape but also plausible mesh-image alignment.

… decoupling the hand-mesh reconstruction task into three stages:

a joint stage to predict hand joints and segmentation;
a mesh stage to predict a rough hand mesh; and
a refine stage to fine-tune it with an offset mesh for mesh-image alignment.

… can promote…

Get depth, regions, and layout from panoramic image quickly and accurately with horizontal features

HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features
arXiv paper abstract
arXiv PDF paper
YouTube (5 min)

We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat).

The compact LHFeat flattens the features along the vertical direction and has shown success in modeling per-column modality for room layout reconstruction.

… allowing per-pixel dense prediction from LHFeat.

HoHoNet is fast: It runs at 52 FPS and 110 FPS with…

Use satellite images to get 3D structure of buildings and roofs

Automated LoD-2 Model Reconstruction from Very-HighResolution Satellite-derived Digital Surface Model and Orthophoto
arXiv paper abstract
arXiv PDF paper

… reconstructs LoD-2 building models following a “decomposition-optimization-fitting” paradigm.

… starts … through a deep learning-based detector and vectorizes individual segments into polygons

… decomposes the complex and irregularly shaped building polygons to tightly combined elementary building rectangles

… introduced OpenStreetMap (OSM) and Graph-Cut (GC) labeling to further refine the orientation of 2D building rectangle.

… takes building-specific parameters such as hip lines … to optimize the flexibility for…

Image classification without normalization that is faster and better than with normalization

High-Performance Large-Scale Image Recognition Without Normalization
arXiv paper abstract
arXiv PDF paper
Papers With Code

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.

… a significantly improved class of Normalizer-Free ResNets.

… smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and

… largest models attain a new state-of-the-art top-1 accuracy of…

Using an audio and vision transformer to count crowds

Audio-Visual Transformer Based Crowd Counting
arXiv paper abstract
arXiv PDF paper

Crowd estimation is a very challenging problem.

… address the critical challenges in crowd counting by effectively utilizing both visual and audio inputs

… introduces the notion of auxiliary and explicit image patch-importance ranking (PIR) and patch-wise crowd estimate (PCE) information to produce a third (run-time) modality.

These modalities (audio, visual, run-time) undergo a transformer-inspired cross-modality co-attention mechanism to finally output the crowd estimate.

… proposed scheme outperforms the state-of-the-art networks under all evaluation settings with up to 33.8% improvement.

We also analyze and compare the vision-only variant of our network and empirically demonstrate its superiority over previous approaches.

Stay up to date. Subscribe to my posts
Web site with my other posts by category


Photo by Jake Weirick on Unsplash

Survey on improving efficiency of computer vision recogntion using deep learning

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions
arXiv paper abstract
arXiv PDF paper

Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence.

… Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks

… Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical

… present the review of the…

AI News Clips by Morris Lee: News to help your R&D

I apply innovative technologies like machine learning, computer vision, and physics to further an organization's goals. Am recognized innovator with 64 patents.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store