SHARP, an approach to photorealistic view synthesis from a single image
Posted by dvrp 1 day ago
Comments
Comment by superfish 1 day ago
Comment by uwela 19 hours ago
It will evolve into people hooked into entertainment suits most of the day, where no one has actual relationships or does anything of consequence, like some sad mashup of Wall-E and Ready Player One.
If we’re lucky, some will want to meatspace with augmented reality.
Maybe we’ll get really nice holovisions, where we can chat with virtual celebrities.
Who needs that?
We’re already having to shoot up weight-loss drugs because we binge watch streaming all the time. We’ve all given up, assuming AI will do everything. What good will come from having better technology when technology is already doing harm?
Comment by camgunz 18 hours ago
Comment by jodrellblank 17 hours ago
A great filter needs to apply to every civilisation imaginable, no exceptions, nerfing billions of species before they get to a higher Kardashev scale, not just something that "could happen" or the latest “Dunning-Kruger” mic-drop in every thread. In 1960s "the great filter is nuclear war", in 1890 "the great filter is heroin", in 1918 "the great filter is world war, we are destined to destroy ourselves", in 2015 "the great filter is climate change our emissions will end us like bacteria in a petri dish", in antiquity "the great filter is the punishment for crossing the will of the Gods".
It's got to be something you cannot get around even if you try really really hard and get very very lucky, because there are ~200,000,000,000 stars in the Milky Way and with those numbers there will be some species which lucks its way past almost any candidate, spreads out and in a mere 100k years is all over this galaxy leaving rocket trails and explosion signatures and radio signals and terraforming signs and megastructures.
Maybe when NASA, ESA, SpaceX, RosCOSMOS, CNSA, IRSA all collapse because of this effect… look how many countries have a space agency! https://en.wikipedia.org/wiki/List_of_government_space_agenc...
Comment by Traubenfuchs 1 day ago
Comment by drcongo 22 hours ago
Comment by kennyadam 18 hours ago
Comment by tecleandor 20 hours ago
Comment by Traubenfuchs 22 hours ago
https://m.youtube.com/watch?v=DgPaCWJL7XI&t=1s&pp=2AEBkAIB0g...
Comment by StilesCrisis 19 hours ago
Comment by what-the-grump 19 hours ago
Comment by schneehertz 23 hours ago
Comment by ghurtado 1 day ago
Comment by rcarmo 19 hours ago
https://github.com/rcarmo/ml-sharp (has a little demo GIF)
I am looking at ways to approximate Gaussian splats without having to reinvent the wheel, but I'm a bit over my depth since I haven't been playing a lot of attention to those in general.
Comment by esperent 18 hours ago
Comment by 7moritz7 19 hours ago
Comment by rcarmo 17 hours ago
Keep in mind that this is not Gaussian splat rendering but just a hacked approximation--on my NVIDIA machine that looks way smoother.
Comment by Leptonmaniac 1 day ago
Comment by emsign 1 day ago
Gaussian splashing is pretty awesome.
Comment by crazygringo 13 hours ago
Already you sometimes see where manually cut out a foreground person from the background and enlarge them a little bit and create a multi-layer 3D effect, but it's super-primitive and I find it gimmicky.
Bringing actual 3D to old photographs as the camera slowly pans or rotates slightly feels like it could be done really tastefully and well.
Comment by kurtis_reed 1 day ago
Comment by ferriswil 1 day ago
Comment by Retr0id 1 day ago
Imagine history documentaries where they take an old photo, free objects from the background, and then move them round to give the illusion of parallax.
Comment by necovek 23 hours ago
Comment by thenthenthen 20 hours ago
Comment by necovek 13 hours ago
Comment by tzot 23 hours ago
Even using commas, if you leave the ambiguous “free” I suggest you prefix “objects” with “the” or “any”.
Comment by ares623 1 day ago
I guess this is what they use for the portrait mode effects.
Comment by derleyici 1 day ago
Comment by eloisius 1 day ago
Comment by avaer 1 day ago
(I am oversimplifying).
Comment by uh_uh 1 day ago
Comment by eloisius 1 day ago
Comment by avaer 1 day ago
I just want to emphasize that this is not a NERF where the model magically produces an image from an angle and then you ask "ok but how did you get this?" and it throws up its hands and says "I dunno, I ran some math and I got this image" :D.
Comment by zipy124 20 hours ago
Comment by skygazer 16 hours ago
Comment by avaer 1 day ago
Comment by carabiner 1 day ago
Or if you prefer Blade Runner: https://youtu.be/qHepKd38pr0?t=107
Comment by diimdeep 19 hours ago
Comment by rasz 3 hours ago
Comment by p-e-w 1 day ago
Comment by supermatt 20 hours ago
My experience with all these solutions to date (including whatever apple are currently using) is that when viewed stereoscopically the people end up looking like 2d cutouts against the background.
I haven't seen this particular model in use stereoscopically so I can't comment as to its effectiveness, but the lack of a human face in the example set is likely a bit of a tell.
Granted they do call it "Monocular View Synthesis", but i'm unclear as to what its accuracy or real-world use would be if you cant combine 2 views to form a convincing stereo pair.
Comment by sorenjan 20 hours ago
Comment by supermatt 20 hours ago
True stereoscopic captures are convincing statically, but don't provide the parallax.
Comment by sorenjan 17 hours ago
Comment by Someone 11 hours ago
Comment by moondev 1 day ago
Comment by delis-thumbs-7e 1 day ago
Comment by rcarmo 17 hours ago
Comment by gs17 11 hours ago
Comment by matthewmacleod 1 day ago
Comment by diimdeep 23 hours ago
CUDA is needed to render side scrolling video, but there is many ways to do other things with result.
Comment by derleyici 1 day ago
Comment by Traubenfuchs 1 day ago
Photoshop content aware fill could do equally or better many years ago.
Comment by yodon 1 day ago
Comment by avaer 1 day ago
Without that that it's hard to tell how cherry-picked the NVS video samples are.
EDIT: I did it myself, if anyone wants to check out the result (caveat, n=1): https://github.com/avaer/ml-sharp-example
Comment by tartoran 1 day ago
Comment by a3w 18 hours ago
Comment by alexgotoi 15 hours ago
What's weird is we're getting better at faking 3D from 2D than we are at just... capturing actual 3D data. Like we have LiDAR in phones already, but it's easier to neural-net your way around it than deal with the sensor data properly.
Five years from now we'll probably look back at this as the moment spatial computing stopped being about hardware and became mostly inference. Not sure if that's good or bad tbh.
Will include this one in my https://hackernewsai.com/ newsletter.
Comment by momojo 12 hours ago
Comment by nashashmi 20 hours ago
Comment by arjie 1 day ago
Is there a similar flow but to transform either a video/photo/NeRF of a scene into a tighter, minimal polygon approximation of it. The reason I ask is that it would make some things really cool. To make my baby monitor mount I had to knock out the calipers and measure the pins and this and that, but if I could take a couple of photos and iterate in software that would be sick.
Comment by necovek 21 hours ago
Comment by arjie 6 hours ago
Comment by Dumbledumb 22 hours ago
This is really interesting to me because the model would have to encode the reflection as both the depth of the reflecting surface (for texture, scattering etc) as well as the "real depth" of the reflected object. The examples in Figure 11 and 12 already look amazing.
Long tail problems indeed.
Comment by pluralmonad 17 hours ago
Comment by reactordev 15 hours ago
Comment by mhalle 18 hours ago
Not only do many VR and AR systems acquire stereo, we have historical collections of stereo views in many libraries and museums.
Comment by Geee 1 day ago
Comment by SequoiaHope 1 day ago
Comment by orthoxerox 18 hours ago
Comment by brcmthrowaway 1 day ago
Comment by duskwuff 1 day ago
Comment by pmontra 20 hours ago
Comment by stronglikedan 12 hours ago
Comment by somethingsome 9 hours ago
Comment by remh 1 day ago
Comment by mvandermeulen 1 day ago
Comment by harhargange 1 day ago
Comment by wfme 1 day ago
1. Sky looks jank 2. Blurry/warped behind the horse 3. The head seems to move a lot more than the body. You could argue that this one is desirable 4. Bit of warping and ghosting around the edges of the flowers. Particularly noticeable towards the top of the image. 5. Very minor but the flowers move as if they aren't attached to the wall
Comment by codebyprakash 21 hours ago
Comment by yodon 1 day ago
Comment by boguscoder 1 day ago
Comment by dag11 1 day ago
Comment by jrflowers 1 day ago
It’s a website that collects people’s email addresses
Comment by avaer 1 day ago
Comment by andsoitis 1 day ago
Why no landscape or underwater scenes or something in space, etc.?
Comment by jaccola 1 day ago
I believe this company is doing image (or text) -> off the shelf image model to generate more views -> some variant of gaussian splatting.
So they aren't really "generating" the world as one might imagine.
Comment by BoredPositron 21 hours ago
Comment by yieldcrv 21 hours ago
Comment by diimdeep 1 day ago
Comment by benatkin 1 day ago
Comment by ballpug 1 day ago
Comment by IlikeKitties 1 day ago
Comment by calvinmorrison 1 day ago
Comment by accurrent 1 day ago
Comment by jijijijij 21 hours ago
I doubt this will be useful for robotics or industrial automation, where you need an actual spatial, or functional understanding of the object/environment.
Comment by accurrent 20 hours ago
I have worked on simulation and in my day job do a lot of simulation. While physics is oftem hard and expensive you only need to write the code once.
Assets? You need to comission 3d artists and then spend hours wrangling file formats. Its extremely tedious. If we could take a photo and extract meshes Im sure we'd have a much easier time.
Comment by netsharc 19 hours ago
Comment by rv3392 1 day ago
Comment by re-thc 1 day ago