Top 5 Papers Presented by Meta at ICCV

The 16th edition of the prestigious International Conference on Computer Vision (ICCV) is scheduled between October 2 and 6 in Paris, France. The event is expected to have over 2,000 participants globally, focusing on cutting-edge research in computer vision through oral and poster presentations, spanning diverse topics such as image and video processing, object detection, scene understanding, motion estimation, 3D vision, machine learning, and applications in robotics and healthcare. Meta, one of the pioneers of the field is also participating in the event with five of their recent research papers on the same topic.

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

This paper explores text-guided human motion generation, a field with broad applications in animation and robotics. While previous efforts using diffusion models have enhanced motion quality, they are constrained by small-scale motion capture data, resulting in sub-optimal performance for various real-world scenarios.

The authors propose Make-An-Animation (MAA), a novel text-conditioned human motion generation model. MAA stands out by learning from large-scale image-text datasets, allowing it to grasp more varied poses and prompts. The model is trained in two stages: initially on a sizable dataset of (text, static pseudo-pose) pairs from image-text datasets, and subsequently fine-tuned on motion capture data, incorporating additional layers for temporal modeling. In contrast to conventional diffusion models, MAA employs a U-Net architecture akin to recent text-to-video generation models. Through human evaluation, the model demonstrates state-of-the-art performance in terms of motion realism and alignment with input text in the realm of text-to-motion generation.

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

This study, in collaboration with Berkley AI Research and Kitware, introduces Scale-MAE, a novel pre-training method for large models commonly fine-tuned with augmented imagery. These models often don’t consider scale-specific details, especially in domains like remote sensing. Scale-MAE addresses this issue by explicitly learning relationships between data at different scales during pre-training. It masks input images at known scales, determining the ViT positional encoding scale based on the Earth’s area covered, not image resolution.

The masked images are encoded using a standard ViT backbone and then decoded through a bandpass filter, reconstructing low/high-frequency images at lower/higher scales. Tasking the network with reconstructing both frequencies results in robust multiscale representations for remote sensing imagery, outperforming current state-of-the-art models.

NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

NeRF-Det is a new approach to indoor 3D detection using RGB images. Unlike existing methods, it leverages NeRF to explicitly estimate 3D geometry, improving detection performance. To overcome NeRF’s optimisation latency, the researchers incorporated geometry priors for better generalisation. By linking detection and NeRF via a shared MLP, they efficiently adapt NeRF for detection, yielding geometry-aware volumetric representations.

The method surpasses state-of-the-art benchmarks on ScanNet and ARKITScenes. Joint training enables NeRF-Det to generalise to new scenes for object detection, view synthesis, and depth estimation, eliminating the need for per-scene optimisation.

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

This paper addresses ethical concerns associated with generative image modeling by proposing an active strategy that integrates image watermarking and Latent Diffusion Models (LDM). The objective is to embed an invisible watermark in all generated images for future detection or identification.

The method rapidly refines the latent decoder of the image generator based on a binary signature. A pre-trained watermark extractor recovers the hidden signature, and a statistical test determines if the image originates from the generative model. The study evaluates the effectiveness and durability of the watermarks across various generation tasks, demonstrating the Stable Signature’s resilience even after image modifications. The approach aims to mitigate risks associated with the authenticity of AI-generated images, especially concerning issues like deep fakes and copyright misuse, by seamlessly integrating watermarking into the generation process of LDMs without requiring architectural changes. The method proves compatible with various LDM-based generative methods, providing a practical solution for responsible deployment and detection of generated images.

Diffusion Models as Masked Autoencoders

The conventional belief in the power of generation for grasping visual data is revisited in light of denoising diffusion models. While direct pre-training with these models falls short, a modified approach, where diffusion models are conditioned on masked input and framed as Masked Autoencoders (DiffMAE), proves effective.

This method serves as a robust initialisation for downstream tasks, excels in image inpainting, and extends effortlessly to videos, achieving top-tier classification accuracy. A comparison of design choices and a linkage between diffusion models and masked autoencoders are explored. The study questions whether generative pre-training can effectively compete in recognition tasks compared to other self-supervised methods. The work establishes connections between Masked Autoencoders and diffusion models while providing insights into the effectiveness of generative pre-training in the realm of visual understanding.

The post Top 5 Papers Presented by Meta at ICCV appeared first on Analytics India Magazine.

Previous News

With An Eye On Larger Seller Base, Meesho Opens Doors For Non-GST Registered Businesses

Next News

Zomato Shares Jump Over 4% To Touch New 52-Week High at INR 105.9

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

Election betting market Kalshi to take USDC deposits

Five reasons to upgrade to iOS 18.1 besides Apple Intelligence

Bret Taylor’s customer service AI startup just raised $175M

Former Disney star Bridgit Mendler talks scaling connectivity and resilience for space

Swiggy Files RHP, Increases Fresh Issue Size To INR 4,499 Cr

Solideon wants to decentralize rocket manufacturing through 3D printing

Top 5 Papers Presented by Meta at ICCV

Disclaimer

More like this

Election betting market Kalshi to take USDC deposits

Five reasons to upgrade to iOS 18.1 besides Apple...

Bret Taylor’s customer service AI startup just raised $175M

Popular

8i Ventures Exits M2P Fintech With 12X Returns

US has 26M strong ‘crypto voting bloc’ ahead of elections — Survey

Elon Musk’s X is changing its privacy policy to allow third parties to train...

59 Cleantech Startups Working Towards Making India Greener

Trump’s crypto website crashed after its WLFI token went on sale

Former Palantir CISO Dane Stuckey joins OpenAI to lead security

Mamaearth’s Chief Product & Technology Officer Jayant Chauhan Quits

Upcoming Events

Fintech Revolution Summit | Jakarta | October 24

Bit N Build International Hackathon' 24 | Mumbai | October 26 - 27

Old Woods Business Services | October 31

Startup-Investors Forum | Bengaluru | November 2

Startup-Investors Forum | Hyderabad | November 2

StartupNews.fyi

StartupNews.fyi

Top 5 Papers Presented by Meta at ICCV

Disclaimer

Popular

More Like this

Top 5 Papers Presented by Meta at ICCV

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!