LiTo nails view‑dependent 3D

- Apple researchers Jen-Hao Rick Chang, Xiaoming Zhao, Dorian Chan, and Oncel Tuzel released LiTo, a 3D model that learns object shape and angle-dependent appearance together from RGB-depth views and a single image. - The March 11 arXiv paper and ICLR 2026 poster say LiTo reproduces specular highlights and Fresnel reflections, then uses latent flow matching to generate 3D objects with lighting and materials tied to one input image. - LiTo targets a weak spot in recent 3D generation: shiny, reflective surfaces that change with viewpoint, not just fixed texture. (machinelearning.apple.com)

A 3D model has to do more than recover shape; it also has to recover how a surface changes when you move around it. LiTo is Apple researchers’ new attempt to store both in one representation. (machinelearning.apple.com) The paper, “LiTo: Surface Light Field Tokenization,” is by Jen-Hao Rick Chang, Xiaoming Zhao, Dorian Chan, and Oncel Tuzel. It was posted to arXiv on March 11, 2026 and listed as an ICLR 2026 poster. (arxiv.org) (openreview.net) The basic problem is familiar in computer graphics: a red mug is not just “red.” Its handle throws shadows, its glaze catches highlights, and its edge can brighten or darken as the camera angle changes. (arxiv.org) LiTo treats RGB-depth images as samples of a surface light field, which is a way to record what each point on an object looks like from different directions. The model compresses random subsamples of that light field into latent vectors, then decodes them into geometry and appearance together. (machinelearning.apple.com) (openreview.net/pdf?id=TVP0p4f2Su)) That matters because many recent 3D systems separate shape from texture or assume mostly diffuse color. The LiTo paper says those setups struggle with view-dependent effects such as specular highlights and Fresnel reflections under complex lighting. ([arxiv.org) (openreview.net/pdf?id=TVP0p4f2Su)) The rendering side connects to 3D Gaussian splatting, the fast display method that represents scenes as many translucent blobs instead of a heavy neural network. The 2023 Gaussian splatting paper reported real-time novel-view rendering at 1080p and at least 100 frames per second on its benchmark scenes. ([repo-sam.inria.fr) LiTo’s claim is not that it invented Gaussian splatting. Its claim is that a compact latent space can better preserve the shiny, angle-sensitive parts that standard 3D generation often washes out. (machinelearning.apple.com) (github.com) The second piece is generative: the authors train a latent flow matching model conditioned on a single input image. In the paper’s description, that lets LiTo generate 3D objects whose materials and lighting stay consistent with the source image. (machinelearning.apple.com) (arxiv.org) Apple has also published a GitHub repository for the project, with the README labeling it “[ICLR 2026] LiTo: Surface Light Field Tokenization.” The repository showed 52 stars and 5 forks when it was crawled. (github.com) The paper and project page describe higher visual quality, better input fidelity, and better separation of geometry and appearance than existing methods, but the public summaries do not surface a single headline benchmark number on their own. The technical weight is in the examples: metallic and glossy objects whose reflections move correctly as the camera moves. (machinelearning.apple.com) (arxiv.org) For anyone building product viewers, visual effects assets, or image-to-3D tools, the pitch is straightforward. LiTo is trying to make a generated object keep looking like the same object when the viewpoint changes. (machinelearning.apple.com)

LiTo nails view‑dependent 3D

Get your own daily briefing