Show HN: I run a vision model on every screenshot, locally, on a 4GB GPU

Posted by skye0110 3 days ago

Comments

Comment by torunar 3 days ago

> Microsoft showed the world wants screen-aware AI with Recall.

Considering the massive backlash it caused, it showed the exact opposite.

Comment by RobotToaster 3 days ago

I think the backlash was more against everything being sent to a cloud server where Microsoft can see everything you do.

Comment by skye0110 2 days ago

agreed, the backlash was the cloud + always-on , not the idea itself. thats basically why i only bothered building the local version.

Comment by shmoogy 3 days ago

I used the Rewind app on Mac and it was very nice to have the ability to search almost anything you did / saw on the computer. If it's local, opt in, and secure then it's potentially very worth exploring.

Comment by skye0110 2 days ago

thats the point - local, opt-in, nothing leaves the machine. rewind was great but the cloud part scared people off. this runs the whole analysis on-device so the search/chat works without anything being uploaded and it works more as a background service.

Comment by 3 days ago

Comment by skye0110 3 days ago

[dead]

Comment by aynite 3 days ago

I tried to use gemma-e4b for text generation, was thinking about to use for image

but i found gemma-e4b is still too "dumb", and barely capable to provide any good response.

could you share your experience with how you use e2b to generate good result?

Comment by skye0110 3 days ago

yes you are not wrong,e2b and e4b are pretty limited , i just gave them a really narrow job and lot of context.

for screenshot generation its not open ended generation, image ocr and windows title is fed..and only structured json is asked in response, it works fine

so i just designed around it - small model + tight prompt + real context instead of hoping the model is clever. what were you trying to generate? can share the exact prompt setup if it helps