OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision
Posted by ternaus 11 days ago
Comments
Comment by plasticeagle 8 days ago
Comment by Joel_Mckay 7 days ago
However, it has a few issues:
1. Patented algorithms that are effectively impossible to license in a commercial setting.
2. Permuted API that change how identically named functions behave over versions.
3. Hardware CUDA version coupling deprecating support every major release.
4. Inconsistent and contradictory documentation in the constant subtle permutations. Downstream projects tend to version lock the lib for really practical reasons.
5. A shift away from core C libraries like ImageMagick & V4l, and into C++ abstractions with legacy Swig wrapper libraries in Java or Python.
6. Perpetual-Beta culture means the library will unlikely ever really fully stabilize.
It is a fun library, until people actually try to deploy something serious. As users will often simply suggest using an old version release if there is a bug.
Everything from Build flags to the API documentation has never fully stabilized. ymmv =3
Comment by markusMB 7 days ago
- OpenCV is Apache license. Yes, it used to be more complicated.
- The only patented algorithm I am aware of, SIFT, used to be part of opencv_contrib. And the README in opencv_contrib would greet you with a warning, that the code may not be fit commercial use for various reasons. Only when the patent expired, it was moved into OpenCV core.
- Same observation for Aruco marker detection, which was in contrib for a long time because the options to choose from were either not-well-maintained or GPL-licensed code. It is now in core OpenCV (and Apache).
- Despite its age, I think that OpenCV is still more than relevant today. And being part of modern languages like C++, Swig, Java and Python (and for years already) is part of that. Still I was surprised how long they maintained OpenCV 2 and 3.
- Over the past releases and few years, my impression was actually that core API was very much stable(izing). Cant say what happened in contrib – or what it feels like when you treat core and contribute as one and a feature progressed from contributing to core.
- I do agree, that I usually I would check that a MINOR releases wasnt actually a MAJOR release, breaking some API or behavior I was relying on. I am hoping that Version 5 is pulling the ambitions for making things differently away from Version 4. So v4 can be used stably ;-)
Comment by Joel_Mckay 7 days ago
Indeed, if your library dependency constellation works, some will static link to stabilize/freeze their project for more than a few months.
It wasn't that v3 was particularly good, but rather v4 was a mess. I predict v5 inherited that mess, and improved it... lol =3
Comment by Sesse__ 7 days ago
Comment by Joel_Mckay 7 days ago
To enable Intel TBB, CUDA, and CPU specific compiler optimizations... one will almost certainly need to re-build the library, and customize your application build.
Some tasks degrade in performance on a GPU, and others are 740 times faster... ymmv. =3
Comment by jrk 7 days ago
Individual image processing operations are often very low arithmetic intensity. If you don't combine them into much larger subroutines—which are necessarily less generic and orthogonal—you spend all your time waiting on memory between every little op.
Comment by Joel_Mckay 7 days ago
Our problem domains must obviously differ. Good luck =3
Comment by harrall 7 days ago
But I can’t really complain because it’s open source and added to by contributors.
Comment by Joel_Mckay 7 days ago
Comment by ranit 7 days ago
> 1. Patented algorithms that are effectively impossible to license in a commercial setting.
then does anyone know how "OpenCV has been the foundation of countless production systems" is possible, as the OP article claims?
Comment by nextaccountic 7 days ago
Comment by Joel_Mckay 7 days ago
However, until each code area turns 17/21 no one knows for sure. It just looks normal at first, and $12k cheaper than MatLab server host licenses. =3
Comment by zdkl 7 days ago
Comment by Joel_Mckay 7 days ago
I discovered that while porting it to a Pi ARM platform years ago (yes it was slow... lol.) Forgot when the IP becomes public domain, but you might want to check that out. If I recall it was unrelated to the COLMAP project design. =3
Comment by zdkl 7 days ago
Comment by Joel_Mckay 7 days ago
https://github.com/openMVG/openMVG/blob/develop/COPYRIGHT.md
https://github.com/cdcseacave/openMVS/blob/master/COPYRIGHT....
Personally, I recommend COLMAP + CloudCompare + MeshLab, but the Mozilla Public License 2.0 should address IP license issues if the author is also the rights holder. Keep in mind all work done by University Students and Staff is often property of the institution unless otherwise stated. It is a delicate subject.
Best of luck =3
Comment by zdkl 6 days ago
https://github.com/alicevision/AliceVision https://github.com/alicevision/Meshroom
No affiliation, just an excellent tool
Comment by akssri 7 days ago
Comment by Joel_Mckay 7 days ago
Comment by fzysingularity 7 days ago
In all honesty, opencv has stood the test of time and I’m certain newer LLMs will likely not attempt to rewrite it from scratch.
P.S. I’ve been a user since the IplImage days, circa 2007, and I’d still consider using it over most CV libraries today.
Comment by dheera 7 days ago
But not for saving video. That fourcc pile of crap doesn't open up in QuickTime player, the default Ubuntu video player, or anything anybody actually uses. I've always had to add a os.system("ffmpeg [ask llm to generate the command for you]") afterwards to fix anything that OpenCV generates.
Comment by doctorpangloss 7 days ago
Comment by deadbabe 7 days ago
Comment by jaffa2 7 days ago
Comment by SEJeff 7 days ago
Comment by greenavocado 7 days ago
Comment by Geee 7 days ago
Comment by ftchd 8 days ago
So there's room for even better performance!
Comment by wongarsu 8 days ago
Sure, running models on the CPU is very much a thing in computer vision (the benchmarked YOLOv8n has 37M params). But this whole announcement feels more like OpenCV catching up to the modern world, not "The Biggest Leap in Years for Computer Vision"
Still great, needing fewer libraries is a good thing, but maybe a bit oversold
Comment by VadimPR 8 days ago
Comment by claytongulick 8 days ago
If a human can't be bothered to write a piece, I can't be bothered to read it.
Comment by danjc 8 days ago
Comment by VulgarExigency 8 days ago
Comment by thin_carapace 8 days ago
Comment by kphorn 7 days ago
Comment by dismantlethesun 7 days ago
Comment by vdfs 8 days ago
Comment by kphorn 7 days ago
Comment by trklausss 8 days ago
Where is the human creativity in writing release notes gone?
Comment by nnevatie 8 days ago
Comment by amorroxic 8 days ago
Comment by nnevatie 7 days ago
Comment by snovv_crash 8 days ago
Comment by antonvs 8 days ago
https://docs.rs/onnxruntime/latest/onnxruntime/
It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.
Comment by pzo 8 days ago
Comment by OvervCW 8 days ago
Comment by gunalx 8 days ago
Comment by dTal 7 days ago
Comment by monster_truck 8 days ago
Comment by cik 7 days ago
Comment by pzo 8 days ago
I'm happy they added option for ONNXRuntime. I wish their cv.dnn was mostly that unified wrapper around many different backends (ONNXRuntime, Executorch, LiteRT, CoreAI) and maybe just some tooling around it (performance metrics tools, model downloads etc). Transformers(.js) approach looks better for me.
Wish they also invested more time into better production ready Camera I/O (for mobiles, device/format discovery, manual settings, depthmap support, etc) and better Highgui that could use different backends (skia, webgpu) and on mobiles.
Comment by GreenSalem 8 days ago
Comment by oceansky 8 days ago
Comment by Npovview 7 days ago
Comment by _qua 8 days ago
Comment by M4v3R 7 days ago
I personally don't mind AI generated content when it's properly reviewed, but unfortunately more often than not the author just glances at the result and decides it's good enough.
Example: https://opencv.org/wp-content/uploads/2026/06/image-1.jpeg
I'm not knowledgable enough to determine whether this diagram is 100% accurate, but some things look off - the arrows in the bottom left seem superficial, some arrows are connected in weird ways, the mini diagram in AttentionLayer block doesn't look right (it has two Softmax icons and one MatMul icon, while the "before" diagram is the opposite).
Comment by bl0b 7 days ago
Comment by saberience 7 days ago
Comment by KolmogorovComp 7 days ago
Comment by saberience 7 days ago
Comment by xdennis 7 days ago
> This is not just another incremental release. OpenCV 5 is a major step forward.
Comment by killingtime74 7 days ago
Comment by jampekka 7 days ago
Comment by xpct 7 days ago
If someone slapped together an article from an LLM and a few internal documents, that tells me exactly how much they cared about it.
Comment by Aachen 7 days ago
As for being well-written, does that refer to correct use of grammar and no typos, or do you mean that you find that bots write better than humans in any other way?
Comment by marknutter 7 days ago
Comment by smt88 7 days ago
Comment by arcanine 8 days ago
Opencv 4.11 : ~255ms Opencv 5.0.0 : ~185ms
with the same code.
Comment by bobmcnamara 7 days ago
Comment by boredemployee 7 days ago
I'm not interested in understanding papers or the math behind it, but rather in how to put a system into production, whether it's object detection, running 20 cameras in parallel on a single computer, like sizing hardware for a specific task, and so on.
Any tips?
Comment by bonoboTP 7 days ago
Then do a slightly more ambitious project. Start with something very simple.
It also heavily depends on what you already know regarding programming, image processing etc.
Comment by kelvinjps10 7 days ago
Comment by yayitswei 7 days ago
Speaking from experience: never used OpenCV before, recently vibe coded a tool that makes supercuts of pool videos, trimming each clip from the cue ball's first strike to when the motion stops.
Comment by eastof 7 days ago
Comment by wolfgangK 6 days ago
[0] https://pyodide.org/en/latest/usage/packages-in-pyodide.html
Comment by globalnode 8 days ago
Comment by wongarsu 8 days ago
If you need something less restricted to existing labels (say wanting all the red apples, or all cardboard signs) SAM3 is great, as the sibling comment says
Comment by IanCal 8 days ago
A quick note to say that this is also a task you can hand to things like gemini.
Comment by dekhn 7 days ago
Comment by globalnode 7 days ago
Comment by IX-103 7 days ago
Comment by fnands 8 days ago
Large general models have taken over in NLP, and (outside of embedded/low latency applications) it seems like they are coming for CV next.
So you should soon be able to have large generic model that can detect whatever for you.
It's already pretty much possible with open-vocabulary detectors like SAM3, where you could just prompt it with "Apple": https://ai.meta.com/research/sam3/
Comment by Npovview 7 days ago
Comment by shenberg 8 days ago
Comment by shelled 8 days ago
Comment by ternaus 6 days ago
But there is still a huge room for improvement in terms of performance, as for some low level operations StringZilla or Numkong are faster, for some, especially for float32 images, numpy is the best.
The most annoying component is that OpenCV is limited to input shapes like (H, W, C), which limits its application to videos and volumes with shapes (X, H, W, C)
Comment by trollbridge 7 days ago
OpenCV was so easy and smooth to set up for doing tasks like generating thumbnails from uploads from arbitrary photo uploads regardless of format (including funky new formats like webp, avif, or heic).
Comment by riazrizvi 7 days ago
Comment by johnAthan_ 6 days ago
Comment by hbcondo714 10 days ago
Why these specific models / versions?
Comment by mkl 7 days ago
Comment by ge96 7 days ago
Comment by mattcox12 5 days ago
Comment by dadachi 6 days ago
Comment by maelito 8 days ago
Comment by MaxikCZ 8 days ago
Comment by monster_truck 8 days ago
Comment by brk 7 days ago
Comment by sixothree 7 days ago
Comment by wiradikusuma 7 days ago
Comment by owenpalmer 7 days ago
Am I the only one that finds this sentence very cheesey?
Comment by ternaus 6 days ago
TL;DR
OpenCV is fast, but torchvision is faster.
Comment by hdgvhicv 7 days ago
Comment by maxdo 7 days ago
Comment by Magnets 8 days ago
Comment by thunky 8 days ago
Comment by charankilari 8 days ago
Comment by xavierforge 7 days ago
Comment by Lukevigoss 6 days ago
Comment by noobcoder 7 days ago
Comment by cdogukank 7 days ago
Comment by imJack 8 days ago
Comment by pimlottc 8 days ago
Comment by leoncos 11 days ago
Comment by nicolailolansen 8 days ago
Comment by wongarsu 8 days ago
I might be on board about LLMs being the future of OCR (though many would disagree), but for general CV they are very inefficient for very limited benefit
Comment by IanCal 8 days ago
Also if they are better then you can also have a flow that’s cheap model -> marginal cases go to more complex thing (and a chain of these).
The yolo models are really shockingly good for their cost and how well they can work with not much training data as well.
Comment by charcircuit 8 days ago
Due to how simple they are to work with they will become popular. Compare NLP before and after GPT-3. GPT-3 majorly brought down the complexity and skill needed for doing NLP tasks even if traditional NLP is much much faster. Ultimately ease of development will win out and the industry will work towards optimizing running such LLMs to make it cheap enough to run.
Comment by regularfry 8 days ago
We're not going to fit Nano Banana or anything like it on a device with 512MB RAM and a GPU old enough to be irrelevant, and again, API calls just aren't on the menu.
Comment by Hendrikto 8 days ago
Even if they were an option, your 300ms latency requirement would exclude them anyway.
Comment by mirsadm 8 days ago
Comment by sebmellen 8 days ago
Comment by _the_inflator 8 days ago
Comment by charcircuit 8 days ago
Comment by ceejayoz 7 days ago
Comment by charcircuit 7 days ago
Comment by ceejayoz 7 days ago
Comment by sebmellen 7 days ago
Comment by Chu4eeno 7 days ago
Comment by charcircuit 7 days ago
Comment by ceejayoz 7 days ago
Comment by charcircuit 7 days ago
Comment by ceejayoz 7 days ago
And most IoT devices aren't doing video transcoding at all. You're making some very odd assertions in this thread.
Comment by charcircuit 7 days ago
The data gets streamed to the cloud where servers with GPUs transcode it. I'm pointing out that IoT devices historically have reached out to servers with GPUs even before GenAI.
Comment by ceejayoz 7 days ago
Comment by serf 10 days ago
some SBC w/ an industrial camera that is doing pick-place or go/no-go operations on a conveyor belt against a singular object type doesn't need a huge image-gen/llm model governing it.
I mean have you even considered the kind of performance an opencv function can get w/ just mask-matching? I mean even with a fancy YOLO model these answers get thrown out in 1.5-50ms ; this is just a wholly different time scaling.
Comment by Qhemlomo 8 days ago
Its a lot better, faster, cheaper to use LLMs for initial labeling together with hand finetuning and then training YOLO with this.
Training YOLO takes a few hours and is then very fast.
Comment by kryptiskt 8 days ago
Like, the AI model tools already exist, all that would be accomplished if OpenCV pivoted would be to take it away for people who want to do low-level vision programming. It wouldn't add anything useful to the world, just destroy an excellent library.
Comment by _the_inflator 8 days ago
Dude, in business we think in terms of large numbers, internationally easily in billion times processing images. This wouldn't cut it.
Also, do you buy the mega expensive super individually designed shoes from the best shoemaker there is to march along though some dirt or simply stick to gumboots?
OpenCV is used behind the scenes for many of the fancy stuff those major AI provider pretend to do. Claude is a huge system and not a LLM anymore.
Comment by TZubiri 8 days ago
Comment by taneq 8 days ago
Comment by TZubiri 8 days ago
Is the image(text) function reversible? Or are they brute force searching a nearest neighbor like word2vec/hash brute forcing.
Comment by sorenjan 8 days ago
Comment by TZubiri 7 days ago
Comment by oliveiracwb 8 days ago