To peripheral vision, a pair of physically different images can look the same. Such pairs are metamers relative to each other, just as physically-different spectra of light are perceived as the same color. We propose a real-time method to compute such ventral metamers for foveated rendering where, in particular for near-eye displays, the largest part of the framebuffer maps to the periphery. This improves in quality over state-of-the-art foveation methods which blur the periphery. Work in Vision Science has established how peripheral stimuli are ventral metamers if their statistics are similar. Existing methods, however, require a costly optimization process to find such metamers. To this end, we propose a novel type of statistics particularly well-suited for practical real-time rendering: smooth moments of steerable filter responses. These can be extracted from images in time constant in the number of pixels and in parallel over all pixels using a GPU. Further, we show that they can be compressed effectively and transmitted at low bandwidth. Finally, computing realizations of those statistics can again be performed in constant time and in parallel. This enables a new level of quality for foveated applications such as such as remote rendering, level-of-detail and Monte-Carlo denoising. In a user study, we finally show how human task performance increases and foveation artifacts are less suspicious, when using our method compared to common blurring.
This work focuses on metamers — stimuli which appear the same to a human observer, despite being different. A classic example is that of colour metamers. Two different EM spectra may appear to be the same shade to a human, as we only typically have 3 types of colour-sensitive photoreceptive cells. This fact is exploited to simplify rendering and display technology, and to reduce bandwidth for compression and streaming by representing images using 3 channels per pixel (e.g. RGB).
Other types of metamer have yet to be exploited in this way. In this work we focus on image metamers — images which appear the same to a user for a given fixation point.
These metamers exist due to a combination of lower visual acuity in the periphery due to lower density of photoreceptive cells, and pooling of features in the HVS. Whilst acuity has been exploited to some extent by existing foveated rendering approaches, pooling remains largely unexplored, even though its effect is more dramatic.
Our method allows us to analyse images to extract a compact model of the perceivable content of the image, and to synthesise a metamer from this model. In contrast to previous work, this can be performed in real time, and the model is compact enough to allow for efficient compression of foveated images.
We divide our method into two main components; Analysis and Synthesis.
In the Analysis step, we take as input an image and a fixation point, and compute a compact model of the perceivable content in the image.
In the Synthesis step, we take the compact model values, and use it to generate one particular metamer to the original input image.
Analysis & Synthesis are detailed below.
In the analysis step, we compute the output of our model. Our model consists of local 1st and 2nd order moments of an oriented image decomposition, the steerable pyramid.
The steerable pyramid can be viewed as an extension to the Laplacian pyramid. In addition to dividing an image into different frequency scales, it also separates each into different orientation bands, giving a more granular decomposition of the image.
Here a video file is used as input, but any 2D image input is possible (rendered 3D scenes, images etc.)
We first convert the image from RGB to a decorrelated colourspace (YCbCr).
We then compute the steerable pyramid responses.
Highpass residual and oriented bands at the highest scale (currently showing highpass)
Oriented bands at scale 0
The local moments (mean and standard deviation) are then computed. The area over which the local stats are computed varies with distance from the fixation point. Further from the fixation point, the area increases, mimicking increased pooling further from the fovea.
The synthesis step consists of generating a random noise image, and modifying it to match the local steerable pyramid statistics of the target image (the output of the analysis step).
We generate the noise image, and compute its steerable pyramid. We then apply a scale and bias to each pixel in each pyramid level to match the means and standard deviations computed in the analysis step.
The final output is then generated by reconstructing from the noise pyramid in the usual way, and converting back to RGB colourspace.
Here we give a brief overview of the applications we introduce for our approach. Each application is described in detail in the paper itself.
The stats maps produced by the analysis step of the approach are of low bandwidth, owing to the lowpass filtering applied to compute the local statistics (particularly further from the fovea). This allows them to be stored much more efficiently than the original image.
We apply a fairly simple compression approach, quantising each of the stats maps and applying standard JPEG compression. We show that this avoids the artefacts that would result from using standard JPEG to compress the image to the same filesize, and preserves the frequency content of the periphery.
Compression results for different methods (columns) on different inputs (rows). All examples were compressed to a filesize of 40KB.
A number of methods have been developed to denoise foveated path-traced images (CITATIONS). We argue that the task of these networks is unnecessarily complicated, as they typically try to recover a full-resolution reference image when a metamer would suffice.
We train a network to instead recover our low-bandwidth stats maps, simplifying its task. This allows for simpler network architectures, or less training to achieve similar perceived quality.
Path tracing denoising application.
We develop a novel MIP mapping approach where a texture has two separate MIP maps, one containing local means and the other local standard deviations. For pixels further from the fixation point, it is sufficient to sample from very low-resolution MIP levels and use the resulting stats to scale a screen-space noise texture. This allows smaller MIP levels to be sampled than in regular MIP mapping, reducing bandwidth without impacting perceived quality.
Video example of Metameric MIP Mapping with fixation point at the centre of the image.
David R. Walton, Rafael Kuffner Dos Anjos, Sebastian Friston, David Swapp, Kaan Akşit, Anthony Steed and Tobias Ritschel Beyond Blur: Ventral Metamers for Foveated Rendering
ACM Trans. Graph. (Proc. SIGGRAPH 2021)