benb wrote:@quickfur: You ask:
"How do you convey a complete contents of a 3D manifold (a 3D array of pixels) to the brain when it has no such channels that can carry such information?"
Your answer was in your text: "...our perception of 3D depth is entirely a construct of our mind."
The problem is that the 3D construct in our mind is based on 2D surfaces. It can only represent a 2D subset (possibly curved in a complex way, but nevertheless merely 2D) of the 3D array of pixels.
Thus, I would suggest that our perception of 4D may as well be a construct of our mind, as well. While binocular depth cuing seems compatible with our commonplace 3D depth perception, I would anticipate that utilizing both input streams (with differing content) would be optimum for 4D representation.
Believe me, I have tried doing 4D binocular rendering for either eye. It doesn't work.

Our brain simply isn't wired to interpret the images in a 4D way, at least as far as input from the eyes are concerned. Our visual apparatus seems to be hard-coded for 3D perception (unsurprisingly!).
I would prefer a setup that delivered content directly to each of my eyes rather than crossing them. As for how that content is organized, your suggestions make much sense.
I'm not sure what you mean by "rather than crossing them". The whole point of a cross-eyed stereogram is to allow arbitrary width of each of image pair. It's trivial to make a wall-eyed stereogram (just exchange the images in the pair), but since our eyes generally can't diverge, this limits the maximum width of the images. The cross-eyed approach allows, in theory, arbitrary width because our eye muscles can cross the eyes, but they can't push them apart. But in either case, the overall result is the delivery of two disparate images to either eye in order to achieve stereopsis.
I may take exception to this one, though:
"While it is not too difficult to describe a 3D manifold in the way of delineating its boundary, 4D photorealism requires realistic light and shade along every point along the 3D volume of the manifold, something which our 3D-centric biological sight has no capacity to handle."
Perhaps this speaks to assumptions we've been making about interiors and exteriors, at least as far as principles of sight or light reflection are concerned. In the spirit of questioning assumptions, what prevents us from visualizing every point of an object?
Nothing prevents our
brain from visualizing such a thing, obviously. The problem lies in how to deliver such information to the brain in the first place! I didn't find any specific numbers as to the resolution of light-sensitive cells in the human retina, but it's pretty high-resolution, say about 10,000 pixels. So in theory, our eyes can see a 2D 10,000*10,000 array of pixels individually (I know this is not 100% accurate, because cell density varies across the retina, but let's for now postulate this simplifying assumption). Now imagine how many different curves can be represented by a 2D array of 10,000*10,000 pixels. Conceivably, one could discern complex curves up to a density of 5000 distinct lines across any single line drawn across the array.
Now suppose we have an analogous 3D array of pixels, which is 10,000*10,000*10,000. Imagine the maximum complexity of a 2D manifold that can be represented in such an array. Conceivably, it can curve in a complicated way up to a density of about 5000 distinct surfaces across any single line drawn across the 3D array. OK.
The challenge now is, how to convey this complex 2-manifold to our brain using only 2D images. The most obvious way is to display 10,000 slices of the 3D array, one at a time... this transmits the total information, however, it loses the connectivity of the 2-manifold across the 3D volume of the 3D array. We see a slice at a time, and we can see the complicated 2D cross-sections of the manifold, but only those that are parallel to the direction of the slices. You may not necessarily be able to put the slices together mentally in order to form a
continuous 3D model of how the complex 2-manifold curves itself in the 3D volume of the array. In fact, even a relatively simple shape that forms simple cross-sections presents difficulty for our brain to synthesize a continuous 3D model, for example:

Each yellow shape represents a cross section of a particular 3D shape. Here there are only 5 cross sections, but the shape is simple enough it's easy to interpolate between them. Can you tell what the 3D shape is? If you've seen this before, probably yes. If not, it's unlikely, or at least, requires much mental effort before a coherent 3D model is conceived.
Now imagine a 3D retina whose contents are far more complex than such a simple 3D shape, and now we're not talking about a handful of cross-sections, each of which is a simple shape easily interpolated between each other, but a complex curve with potentially completely different cross-sections at almost every point. It stretches credulity to believe that our brain will easily be able to reconstruct the full 3D contents of the 3D array of pixels! In any case, real-time visualization via this method is out of the question.
OK. What other methods do we have besides slices? Perhaps we can use stereopsis to convey the shape in its native 3D space by presenting a stereo pair? Assuming a complex-enough 2-manifold drawn in the 3D array of pixels, it may have thousands of mutually connected layers that fold back upon themselves, branching and merging in many different 3D orientations. We have already established that after about 4-5 surfaces that lie along a single line of sight, the 2D projection images are already so cluttered that it's difficult to see every part of the 3D shape thus presented. But here we're talking about
thousands of layers of a single 2-manifold that curve and warp in a very convoluted way in the 3D volume of the 3D pixel array. Again, it's infeasible to represent the entire complexity of such a thing in a single image, even if it's a stereo pair. We would have to present it piecemeal, which again leads to the situation that, given 1000 such piecemeal presentations of the 2-manifold (1000 by the assumption that we can maximize the transmitted information by displaying up to 5 distinct surfaces along a single line-of-sight each time, given a curve complexity of 5000 parallel surfaces at maximum density), can our brain easily form a complete mental model of the entire thing? It should be much easier than the cross-section approach, for sure, but you'd still have to look at 1000 stereo pairs before you have enough information to reconstruct the whole thing. Real-time visualization? Not a chance.
Now all of the foregoing concerns only a monochromatic 2D manifold drawn into the 3D array of pixels. 4D photorealism isn't merely a single monochromatic 2D manifold; we're talking about an arbitrarily complex 3D pattern of colors across the pixel array! There may or may not be a pattern of surfaces you can decompose the image into, that you can employ stereograms on. If the pixels represent a complex, dense texture of a coarse, grainy 4D surface, for example, the only way to fully convey the entire texture would be the cross-section method, since potentially every slice of the array will have almost no continuity with the previous slice! Sure, if our brain can somehow receive all of this information in one go, then in theory we could perceive the entirety of the contents of this 3D array at once. But given that currently the best channel of transmission is via our 3D-challenged eyes, surely you must agree that this stretches credulity past its breaking point! Visualization in real-time, therefore, is completely out of the question.
So that brings us back to the inherent limitation that we can only feasibly visualize extremely simple 4D scenes, that consist only of relatively simple objects in relatively simple arrangements. While it may be theoretically possible to go somewhat beyond that, it seems to be a stretch to suggest that we can grasp an arbitrarily complex scene, much less do so in real-time! Consider, for example, a typical photorealistic 3D scene. It requires pretty much every pixel in the 2D projection image of the scene. Now imagine how a hypothetical 2D creature might perceive such an image. From its disadvantaged POV confined to the 2D plane, it cannot see the image in its entirety, but can only perceive 1D slices of it at a time. Equivalently, this is like trying to play a 3D first-person shooter with a 1-pixel wide screen. How well can you visualize the total scenery under such a deficient vision? Probably not very well, if at all!
The same challenge applies in going to 4D scenery with our deficient 2D vision.