- new
- past
- show
- ask
- show
- jobs
- submit
I precomputed and cached each one so it was nearly instant. The effect - although only a crude wrapper around what Sharp already does - was quite transformative and mesmerising. Just the ease of pointing it at any folder of photos and viewing them fully spatially.
It was a bit of a mess code-wise and kinda specific to my local setup - but I should really clean it up deploy it somewhere for other people to try. Although I keep assuming someone else will do it before me and make a better job of it.
just edit app.py or set an env var for GALLERY_FOLDER, install the reqs, run app.py and use add "/gallery" to the url displayed in the console.
I think all-client-side in-browser AI imagery is becoming very doable and has lots of privacy benefits. However ONNX web leaves a lot to be desired (I had to proto patch many pytorch conversions because things like Conv3D ops had webgpu issues IIRC). I have yet to try Apache TVM webgpu approaches or any others, but I feel if the webgpu space were more invested in, running these models would be even more feasible.
This is impressive as hell
Very cool demo. It works in about ~9 seconds on my machine.
A few asks if you're going to devote more time to the project: can you make a full orbital camera - it seems to not be able to orbit 360? Also, can you use double click drag to move the camera in non-orbiting mode for view refinement? (Super minor nitpicks - this demo is really cool.)
> Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's).
Nobody should GAF about this. We have all the major players distilling each other in the open. This gives Apple the ability to slap you with lawyers, but in practice you'll often get more done if you just break the rules.
Do you know of any other image-to-splat models? WorldLabs has a few versions of their Marble model, and the Tencent Hunyuan team just released HyWorld as open weights:
https://github.com/Tencent-Hunyuan/HY-World-2.0
HyWorld looks to be SOTA and better than all the other players.
Apple's Sharp is awesome in that it is fast, but it only generates a small depth sample from the image. There are no back faces or splats, so if you move the camera even slightly from the original perspective, you'll see lots of holes.
Have to admit, I dont get it. I tried it with 3 landscape photos I have and the results were nowhere close to the results in the demo, but that just speaks to the model.
Regardless, its very cool as a browser tech showcase.
2. There are many models similar to Sharp that do accept multiple photos - but Sharp is trying to solve a specific problem. If you have multiple photos - don't use Sharp.
Ubiquity and coverage of devices is what will take longest. Largely dependent on how well we can shrink models with similar performance and how much we can accelerate mobile devices. This feels like it's but further (<3 years?)
I personally tested it on 32gb Apple M2, and it's able to run much heavier stuff.
I might create a compressed version of the model, that would work on low-ram machines.
(16MB M1 Macbook, Chrome)
Model results https://apple.github.io/ml-sharp/
Regarding the ios lockscreen - I believe they are different models. I think Apple use this one to generate those Vision Pro 3d photos though, but I'm not too sure.
I ran into quite a few out-of-memory iOS safari issues when I was building continuous voice recognition for my blind chess game, so people could play while on the go.
I originally tried to get away with just Whisper Tiny in the chess game [2], but it performs worse on the kinds of short phrases (knight E4, c takes d5, etc) used to dictate chess notation. Even with hotword-based phrasing and corrections, I found its accuracy on brief inputs noticeably poorer. So I switched over to Sherpa [3] trained on gigaspeech. It’s significantly more accurate, but it also comes with a correspondingly larger memory footprint.
Ideally, I would have used just one engine, but I needed a fallback for iOS devices (especially older ones) which can easily OOM.
[1] - https://github.com/snakers4/silero-vad
[2] - https://shahkur.specr.net
[3] - https://github.com/k2-fsa/sherpa-onnx
https://github.com/xiph/rnnoise
[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605