What job interviews taught me about Kubernetes
Posted by chmaynard 1 day ago
Comments
Comment by mikeocool 1 day ago
The uniformity is nice, we were moving from apps running directly ec2 instances provisioned with ansible. Each time we spun up a new service it was a process to get the ec2 instances provisioned just so.
But k8s is such a pain in the ass. One thing that I think people new to it don’t realize is that it’s not at all batteries included - to get a basic managed cluster setup, you’re still going to be installing a bunch of additional controllers (ingress, cert-manager, external dns to start). And then you’re on the hook for making sure all those processes stay up (hope the admission webhook controller for a critical resource doesn’t go down!). Then you’ve got to do a major upgrade on not only your cluster, but all of those controllers every ~3 months. And no one is shy about introducing breaking changes.
Also you’re introducing a huge amount of complexity with the k8s networking and dns layer that most startups have zero need for (if you’re on EKS, make sure to read about scaling and monitoring CoreDNS).
I think there is a real hole in the market for a simple solution that lets you deploy some containers to some instances in a declarative fashion without all of that complexity and does decent LTS versions. I imagine there’s something out there that does this, but k8s has really sucked up all the oxygen.
Comment by BobbyTables2 1 day ago
Everyone initially wants thing A. But then they want to customize it to do all permutations and combinations n of A, B, C. They want it to be extensible. They want redundancy. They want orchestration. They want integration.
It’s why practically every config file format eventually becomes its own scripting language. Even HTML started off simple — now ridiculously complex — all the more ironic since practically nobody writes it by hand. Instead of CSS simplifying it, it became more complex.
There is another thing that is extremely customizable and extensible. It’s called a programming language. People write programs to solve specific problems.
There seems to be a perverse trend of cobbling together a Byzantine mesh of libraries, plugins, and services with complex configuration files to make it do practically everything possible. We just used to write software for such purposes…
And for anyone who thinks HTML is simple… the A (anchor) tag has an “ping” attribute that results in POST requests to a list of URLs when a link is clicked ! The list of attributes and resulting variations in behavior is quite mind boggling. It was supposed to be a damn link! https://html.spec.whatwg.org/multipage/links.html
Comment by ajayvk 1 day ago
https://github.com/openrundev/openrun is a project I am building. It supports declarative deployments, on a single-node with Docker or onto Kubernetes. The target use cases is limited to standalone web app, like internal tools. No support for stateful services, you manage stateful services yourself. With that simplification, OpenRun provides a much easier developer experience.
Comment by SOLAR_FIELDS 1 day ago
Comment by ajayvk 1 day ago
Comment by bbkane 1 day ago
Comment by notarobot123 1 day ago
Advertisers have really shaped the Web right down to it's core specifications.
Comment by aranelsurion 1 day ago
Fargate and Cloud Run first come to mind.
Comment by chaos_emergent 1 day ago
Comment by DrScientist 1 day ago
One of the main problems here is that programming languages typically have lots of tools to help validate correctness, whereas configuration tools are typically either much less mature or woefully underused.
There is nothing more frustrating in something failing due to a misconfiguration - but you've no idea what the correct value should be.
Comment by sethammons 1 day ago
> You have an error in your config on line 1. Good luck.
Comment by jmaw 22 hours ago
Comment by bandrami 1 day ago
Comment by panny 1 day ago
>Everyone initially wants thing A. But then they want to customize it to do all permutations and combinations n of A, B, C.
Oh, I wouldn't be so sure of that. Think about Eclipse vs IntelliJ. 10 years ago Eclipse had all the features, and IntelliJ didn't so it was fast. Most developers don't use all the features, so most developers were very happy to move to IntelliJ, even if it was not free. Then IntelliJ spent the next decade building lots of features it didn't have. Now everyone wants off of IntelliJ, because it's no longer fast. Now it's got a lot of "useless" features like Eclipse too.
Comment by threethirtytwo 1 day ago
Comment by chvid 1 day ago
Comment by threethirtytwo 23 hours ago
Comment by linksnapzz 20 hours ago
Comment by singron 1 day ago
1. No overlay networks. 1 IP per machine. pods use dynamically allocated ports, and the kubelet enforces pods listen only on their assigned ports using seccomp.
2. No kube-proxy or equivalent Layer-4 "load-balancer". It's not good, but it's often used. You should use some kind of Layer-7 load balancing instead. Also you need to look up the port number from (1). This also greatly lessens the need for DNS.
3. A better config language. YAML and helm templates are terrible. kustomize is built into kubectl, but it's frustratingly limiting and also still very complicated. Something like nix would have been great. This can make it easier to upgrade third party configs since you can have more logic to validate and merge your settings with upstream defaults or templates.
4. Maybe an EBF-like for the api server? If the built-in k8s objects don't have a setting for something, then you need to write an operator or control loop yourself and then run that too, which is a big lift. Over time, k8s just keeps adding more and more built-in things and then revising them, which creates a ton of churn. If you could easily script simple operations, then they wouldn't have to build in every permutation ahead of time. E.g. the HorizontalPodAutoscaler has 24 config object types with several fields each, but all it does is set replicas based on data read from the api-server, so it could be replaced by some kind of flexible script that runs in the control plane.
Comment by ddreier 1 day ago
Comment by insanitybit 18 hours ago
Comment by stock_toaster 1 day ago
Comment by vbezhenar 1 day ago
2. Does not work for all protocols. Again your solution restricts the number of protocols to HTTP protocols. Might work for many uses, but still this restriction doesn't sound very good. Universal load balancer is much simpler conceptually.
3. YAML is not terrible. YAML is awesome. Kubernetes manifests are terrible, that's I agree with. Docker compose is nice, for example. Kubernetes manifests felt like they were designed to be generated from something, but everyone ended up writing them directly or with templates. Though I think that XML generally is superior format so I'd vote for XML in the end.
Overall your suggestions look like you want to shift complexity from cluster operator to software developer. I'm not sure industry supports that, recently it seems to move in the opposite direction, but that's interesting perspective. I guess with some wrappers for some containers it could be made usable.
But honestly you just want to throw away years of progress in containers and network namespaces. I understand that kubernetes mechanisms are somewhat complicated, but the core idea is to make pods look like virtual machines and I think this is very worthy idea.
Comment by mikeocool 14 hours ago
I would absolutely trade flexibility for complexity. Particularly for edge cases like hard coded ports.
Comment by corvad 22 hours ago
Comment by himata4113 1 day ago
CNPG is an absolute monster (in a good way). cert-manager is easier than the docker alternative, calico has never failed me (except in bgp mode which has some footguns like not being able to come back from a dead state since it has a chicken and an egg problem unless you point it to the external load balancer which I would have known if I read the documentation). trafeik is all you need. talosos largely mitigates the bare metal problems and comes pre-hardened and pre-optimized.
I solo most of my development projects and have used k3s for all of them. The only complaint is that cert-manager by default will fail silently and your certificates will expire. I largely mitigated this by having proper visibility setup via grafana and automated alerts (warns if certificates are about to expire) which should have been done by me anyway.
Two years ago I'd agree, today with LLMs everything I have runs talos with fully automated updates and I haven't had to be on-call for almost a year.
Comment by makeitdouble 1 day ago
K8s is easier at smaller scales (I understand k3s as a packaged version ?), but you still need one or two people in your team to properly understands all of the concepts and inner workings of k8s, and be able to neck deep into if/when shit hits the fan.
For a small team that's a lot of commitment for something that is usually not their bread and butter and wish they could build once and only slightly tweak every year or so.
Comment by englishspot 20 hours ago
and more to that last point, we haven't talked about maintaining the actual nodes themselves yet.
Comment by himata4113 15 hours ago
Comment by mikeocool 13 hours ago
Then there was an upgrade process that required a fair amount coordination between when you changed your manifests, when you upgraded your cluster and when you upgraded your ingress controller.
PodSecurityPolicies also gained a lot of traction and didn’t really have an alternative before it was deprecated.
Also, custom operators don’t all subscribe to the don’t break non-beta resources in the same way core does.
Comment by KronisLV 1 day ago
MDomain blog.kronis.dev
I'm not saying that cert-manager isn't nice, but with regular Docker/Compose/Swarm setups you can just run a web server/load balancer on whatever ports you want. With mod_md the above is all I really need in a regular .conf file to provision LetsEncrypt certs for my blog (very similar with something like Caddy too). And it's the same in Docker as it is when running the web server directly, I think that's why starting with Docker is really nice, because it has fewer custom abstractions and sometimes regular software does things elegantly already.Comment by mikeocool 14 hours ago
IIRC cert-manager has about three layers of custom resources to conv through when figuring out why a cert isn’t renewing.
Comment by vbezhenar 1 day ago
Just to provide a similar example. Linux system is insanely complicated. Kernel alone has thousands of options. Distos have tens of thousands of packages. Wherever you look at, everything is hard and complicated. Firewall, containers, init system, filesystem hierarchy, storage layers. One would think that some people desire simpler operating system. But everybody uses Linux despite all these complexities. Try to find OpenBSD in production, for example. It's not easy.
Comment by mikeocool 14 hours ago
Comment by foobarian 51 minutes ago
Comment by wlonkly 13 hours ago
Comment by zzyzxd 1 day ago
And if you can do this again, what's your solution to reverse proxy, certificate management, DNS...etc? I guess you can docker-compose some custom stack on a single machine, maybe add one more machine then you can say it's HA enough for small scale. But you can also spend the same amount of time to install those kubernetes controllers with zero customization. In my experience, if you go with the default configuration, most of the well-maintained k8s components are boring as hell these days.
> (if you’re on EKS, make sure to read about scaling and monitoring CoreDNS)
If load to your service increases, you need to scale up/out your service. This is universally true. Do you have a proprietary solution that's easier and more reliable than bumping up the replicas count in kubernetes?
There are lots of design decisions in Kubernetes that I hate. But if you want me to choose between Kubernetes and any proprietary stack, in 2026, I would definitely choose Kubernetes.
Comment by packetlost 1 day ago
I have a strong preference for renting bare metal and it has served me extremely well.
Comment by zzyzxd 1 day ago
Personally, I think the complexity is on the same level.
Comment by packetlost 1 day ago
Comment by XYen0n 1 day ago
Comment by mikeocool 1 day ago
As for EKS, having to monitor and manually scale the built in DNS service or else my queries are just going to stop resolving is not the type of thing I expect to have to manage on a managed service. I see they have finally released autoscaling for CoreDNS, though it took them 6 years.
Comment by jaggederest 1 day ago
Comment by foo4u 1 day ago
Comment by embedding-shape 1 day ago
Hashicorp's Nomad basically is just that, supports various way of running stuff too which is neat. Shame about the license change which basically killed all my interest in it, so seems the hole is indeed still unfilled.
Comment by nyrikki 1 day ago
You can still add pods if needed and the systemd integration works.
Plus you can actually improve isolation by co-hosting services under separate UIDs.
Like any container it is just co-hosting, and elasticity is a bit slower with autoscaling instances, but it removes most of the complexity of K8s which very few org benefit from or have the culture to support.
Comment by mikeocool 1 day ago
Though as I recall, it makes heavy use of consul, which I have used in anger, and makes me a little weary (though that experience is likely very out of date).
Comment by embedding-shape 1 day ago
Comment by xav0989 1 day ago
Comment by embedding-shape 22 hours ago
Comment by mocamoca 1 day ago
Comment by ddreier 1 day ago
Comment by alexjurkiewicz 1 day ago
Comment by embedding-shape 1 day ago
Comment by mocamoca 1 day ago
Some self-reloading HAProxy in nomad to automatically assign URLs to services when needed. Could have used Consul but meh.
Tailscale for private networking.
Comment by josevalerio 1 day ago
https://www.macchaffee.com/blog/2024/you-have-built-a-kubern...
Comment by mikeocool 1 day ago
Perhaps those days are behind us.
Comment by greenavocado 1 day ago
Comment by bellowsgulch 1 day ago
Comment by Thaxll 1 day ago
Just use ECS / Fargate with an ALB in front if you need a simpler use case.
Comment by mickael-kerjean 1 day ago
Comment by psviderski 1 day ago
That's how I see it as well but it's really tough to go against the grain. I have a small enthusiastic community of users around Uncloud (https://github.com/psviderski/uncloud) who went full circle - fed up with k8s and came back to simple, boring declarative Compose deployments across a handful of interconnected hosts.
Uncloud is essentially a cluster version of Docker Compose without a control plane and cluster management overhead.
Comment by Eridrus 1 day ago
We're moving our non-critical components onto EKS (pipelines, tooling, etc). We had one outage from runaway IP allocation in a subnet, but otherwise it's been pretty stable.
I do hear vague horror stories so I'm really not excited about moving our prod stack to it, but it's actually been really good for installing 3rd party software so far.
Comment by a_c 1 day ago
Comment by petterroea 1 day ago
That is, k8s is probably best considered when you are beginning to consider having an infrastructure department, or if one of your early hires knows Kubernetes and is opinionated in a way that is less "throw cool and complex stuff at the wall"* and more "the 5 things I want in a k8s cluster that I don't want to spend much time on and should just work"
My understanding of the 2000s and 2010s was that there was a big focus on inventing self service deployment systems for developers, and k8s is that solution(!), for the same scale that would begin considering re-inventing the wheel internally anyways
Comment by esprehn 1 day ago
On the application developer side k8s is awesome fo, but the you look inside the box and it melts your face off.
I'm not sure a middle ground exists unfortunately. It's either full service like Lambda or bag of knives like k8s.
Comment by aurisl 1 day ago
Comment by sharts 19 hours ago
This isn’t so different from say Linux vs BSD. You can roll your own things and call it a system. Or you can just use something that targets a spec to provide a (mostly) cohesive and consistent layer to build upon.
Comment by alexjurkiewicz 1 day ago
Comment by zug_zug 1 day ago
Unless of course, all of the busywork that comes with kubernetes IS the value (to the engineer). Perhaps a bunch of engineers know at some level that locking the company into an overcomplicated cloud-within-a-cloud setup that has all sorts of weekly issues and requires constant work gives them a lot of job safety that they wouldn't get if they just used an AWS autoscaling group and you're done for the next 5 years.
Because simpler solutions DO exist (like a loadbalancer in front of an autoscale group, and not making a giant SOA for an app that orders you taxis, or books you a bnb or whatever nonsense).
Comment by tayo42 1 day ago
Comment by jpb0104 1 day ago
Comment by czhu12 1 day ago
Comment by esafak 1 day ago
Comment by httgp 1 day ago
There's Nomad for this; I wish more teams would run Nomad.
Comment by erpellan 1 day ago
It was glorious.
Comment by stevenaenns 1 day ago
Comment by peterldowns 1 day ago
Comment by TZubiri 22 hours ago
Containers? In this climate? What's the kernel LPE rate at after copyfail and copyfail2? No containers, VM or harden. No half measures.
If there's going to be something new, it needs to be topical, and containers are out.
Comment by emodendroket 1 day ago
I mean, it's CDK and whatever equivalents other providers have, isn't it? If you fully embrace all the stuff they give you then it's straightforward to declare everything and it all works together. The downside is the vendor lock-in but unless you actively deploy to multiple environments, which most people don't, you're probably locked in in various ways without knowing about it.
Comment by zsoltkacsandi 22 hours ago
Comment by bitfilped 23 hours ago
Comment by dsincl12 1 day ago
Comment by antonvs 1 day ago
Because anything else involves making opinionated decisions that will be wrong for many users.
People who don’t understand why k8s is so widespread don’t understand all the problems it’s solving.
Comment by te_chris 1 day ago
They’ve announced persistent “instances” recently which solves a big problem for us - sometimes you want continual long running workloads.
Comment by busterarm 1 day ago
The problem is that when you run this long enough you want K8s features anyway.
Comment by kilobaud 1 day ago
As someone who has productionized and maintained truly hundreds of those clusters across several jobs, it is hard at this point for me to recommend Consul, Nomad, or Vault to anyone serious about building reliable applications. Too many broken upgrades and manual click-ops tasks just to keep them online. (…and I’ve said nothing of the actual product!)
Comment by secondcoming 1 day ago
Comment by busterarm 16 hours ago
I don't entirely agree with your statement about zero-downtime instance replacement though. We built our terraform around doing one-at-a-time instance replacement and removing/adding nodes in Hashicorp Raft clusters is pretty much the easiest thing I've ever done with infrastructure.
That's really always been the biggest selling point around Hashicorp's stuff for me. They made bootstrap and maintenance operations easy enough that a caveman could do it. Even recovering from problems isn't terribly hard unless you're already doing something stupid (Roblox outage).
I also have deployed and managed _hundreds_ of these over the last 8 years or so and I'm not really having the same problems that you do. But we don't upgrade to the latest and greatest because it _does_ take them a few versions to get their feature launches correct. This is mainly a Nomad problem now though -- consul and vault are pretty brainless to operate.
Still though, we _also_ use Kubernetes and I prefer it. Most of our software engineers don't though because they don't actually want to take the time to understand it, they just want to run binaries and forget about it.
Comment by xlii 1 day ago
Ask your favorite GPT to generate manifests, get primary app into cluster with telepresence or execute straight from container and switch contexts and clusters like it's 90s again.
One reason I dislike Docker Compose and Docker is lack of isolation. Yes sure if you put your arm deep enough you can get it, but on local k8s I can spin cluster per workspace and not worry about conflicting ports between PostgreSQL instances.
Before LLMs writing consistent YAMLs was PITA but today on low/development scale it's pretty much free lunch.
Comment by hadlock 1 day ago
Comment by d675 1 day ago
Now am laid off, and hard to find a job...
Comment by xlii 1 day ago
Unfortunately it's an industry wide problem, and it touches many areas and levels of expertise. Some believed that AI can drop costs and compressed job spaces.
It starts to bounce off but it's not back to - what I could fall - normal baseline.
Comment by d675 21 hours ago
True startups need only senior+ and big ones don't wanna interview often.
Comment by wiseowise 1 day ago
And it did! For companies, not for you.
Comment by kevmo314 1 day ago
Comment by bigstrat2003 1 day ago
Comment by lijok 1 day ago
Comment by TZubiri 22 hours ago
Do I want potential to increase expenditure by infinite percent? Or do I want to sign a contract for 2 500$/mo dedicated servers?
Let's be real the latter can handle 20k concurrent users without breaking a sweat, and that's like 99.9% of companies and projects.
Comment by bakies 11 hours ago
Comment by embedding-shape 1 day ago
Using Kubernetes because you're unable to grok docker's networking enough so you can't run multiple containers using their own ports and not conflicting with other stuff sounds like a recipe for disaster, even (especially?) if you use agents for this. Particularly if you let them manage a production environment, you're bound to lose important data eventually.
> pretty much free lunch.
Aah, famous last words of the young :)
Comment by aleksiy123 1 day ago
I think diy homelab/hosting is more accessible than ever.
Cut costs on cloud spend and invest into AI spend.
For a solo dev on a budget, I think it just makes sense.
Comment by globular-toast 1 day ago
Comment by athrowaway3z 1 day ago
At any stage of https://www.macchaffee.com/blog/2024/you-have-built-a-kubern... a SOTA model can repackage it into Kubernetes.
If you're feeling extra spicy you don't even need the deploy scripts. Just a `llm` user account with the right permissions & ssh keys on all your servers.
Comment by perrygeo 1 day ago
Writing manifests seems like a trivial thing to focus on. Who operates the k8s cluster in production? Who runs upgrades? Who's on call to monitor the system? Of course if someone else is doing all the work for you, it feels like free lunch!
Comment by hdjrudni 1 day ago
With managed k8s, your host upgrades the control plane. And then you can upgrade your PHP, Python, Node, what have you, by flipping a number in your Dockerfile.
Not like other forms of sever infra don't need monitoring and upgrades anyway.
Comment by darkwater 1 day ago
Meanwhile, the update stress of core k8s - even managed - is much higher than a good managed old fashioned (immutable) infrastructure.
Comment by iamcreasy 1 day ago
Comment by johnsmith1840 1 day ago
K8s is incredibly deep and complex but with AI it's finally easy to just hello world it.
Comment by bigstrat2003 1 day ago
Comment by xlii 1 day ago
I mostly agree it's an area that's risky to wander into mindlessly but it is much more easier to validate knowledge than to practice it.
E.g. I can't write Chinese but can validate if piece of Chinese is a valid one (by feeding to N translators, other LLMs or asking a friend who knows Chinese).
Under assumption of "LLM output is false until proven otherwise" it's not a bad approach and worked for me in various scenarios. (E.g. I asked for implementation of algorithm in Rust and then validated it against base definition).
Comment by mettamage 1 day ago
We all have different learning styles. I learn through play when it comes to LLMs.
Comment by johnsmith1840 1 day ago
Until you physically see it running learning is slow.
I learned k8s through many months of study and pain pre AI. Once I actually got it up learning was FAR easier.
This is like using a jupyter notebook to learn python and is always the first thing I point to for someone just starting to learn. Only after should you learn venv, pip install, classes ect.
100% use AI to get started on something you don't understand. I will literally never start to learn about a technical system again without first doing a hello world with AI.
Comment by eu-tech-tak 1 day ago
It is not perfect, but a good place to start to get a hang of how to setup your own K8S setup if you are new to Kubernetes.
Comment by SJC_Hacker 1 day ago
Comment by suralind 1 day ago
But I found funny that the OP summarized to use Kubernetes when CTO is no longer the only dev.
Comment by vbezhenar 1 day ago
You can actually treat kubernetes as a glorified docker compose engine. Deploy pods, deploy nginx instead of ingress controller, deploy certbot cronjob instead of cert-manager, and believe it or not, it'll work! On a single server!
People often compare Kubernetes with thousands of additional services to a simple VPS, but that's not apples to apples comparison.
Comment by solatic 1 day ago
I just want to point out that you can totally still do this with Kubernetes. Of course it's not correct, but you can save that unencrypted secret in a .env file right into your container while you're building it - no need to use Kubernetes's support for supplying environment variables from the manifest. And of course, you don't even need a Dockerfile to build that container - you can just exec into a running container, paste it in, and then docker save.
Kubernetes doesn't save you from making stupid decisions, it just makes it easier to make better ones.
Comment by suralind 1 day ago
Comment by bionsystem 19 hours ago
Uniformity ? Try deploying openbao inside kube, if kube decides to restart your pods, you're in for unsealing them at 3am, waking up everybody who owns a Shamir key. So bao stays out of the cluster, or pinned to certain nodes, defeating the purpose entirely. Also, with the ultra wide variety of tools at every layer of the stack, uniformity is a joke ; there are no 2 kube cluster deployment that are the same really.
Standardized knowledge ? The operating system is standardized knowledge. Any competent SRE should be able to login into a Linux box and figure out what's running there. And if you let your previous ops shadow it all you're just a pretty bad CTO.
Tracing who does what ? First of all anybody with admin access can run one time jobs just like anybody with sudo can run one time commands. That's like chapter 01 of the kube doc. Also again at the kube layer itself, below the helm chart, the ops who set that up or updates it can and will change stuff that breaks stuff.
Kube isn't necessarily bad and has it's purpose but it's not a product. It's like Linux, a complex piece of tech that requires a lot more knowledge than "just push this helm chart" to work.
Comment by bakies 11 hours ago
Comment by bionsystem 2 hours ago
As per standardized knowledge, I can't see how somebody even proficient with kube, could jump into any app and troubleshoot bad behavior. Apps each have their quirks and subtleties, specific components that behave a certain way. The layers still exists, the kube cluster itself (which again has many component options at every layer of the stack ; hard to know them all), and the app (which will require at least some specialty knowledge).
If it's just about pushing helm charts we wouldn't need SRE anymore, just a CI.
Comment by liampulles 1 day ago
Beyond that, there are massive holes of despair to fall down if a novice team starts to engage with extensive operators (starving the control plane), DB operators (distributed persistence) and build operators (spikey, expensive loads). At least, I know that I've had to dig out of those holes.
I just hope people don't use k8s in the same way many use microservices: as a way to introduce complexity for complexity's sake.
Comment by zbentley 1 day ago
Comment by portly 1 day ago
Comment by clickety_clack 1 day ago
Comment by Esophagus4 1 day ago
That the tech benefits may not be there, but they’re using it for the non-tech benefits
Comment by clickety_clack 1 day ago
> My personal threshold would be the moment the CTO isn't the only engineer anymore. As soon as a second person shows up, the problems K8s solves become real.
Comment by darkwater 1 day ago
Comment by Esophagus4 1 day ago
And if they don’t?
Comment by darkwater 1 day ago
Comment by mijoharas 1 day ago
My read of the article is that this is correct, but that the benefits they're using it for are the operational, and organisational.
I think the comment you're replying to is arguing that those benefits don't really matter or outweigh the additional complexity costs when N=2 (engineers). I think I'd probably agree.
Comment by Esophagus4 1 day ago
If K8s is new to you, sure. Definitely not the time to learn it.
But I can see a world where it’s fine to use early on.
Especially if your team is cloud native. K8s isn’t really a new controversial toy in my eyes, it’s pretty well supported and good enough for most things out of the box.
I just don’t think it’s as big a deal as the “CTO IS WASTING EVERYONES TIME” argument.
Comment by codemog 1 day ago
I would not advise asking the majority of CTOs these questions either. Many got to that position by saying what people want to hear, which is the "average" safe answer. They will parrot whatever is "hot" at that time because it's the least risky response. They are not your friend nor a reliable source.
Comment by quibono 23 hours ago
> I would not advise asking the majority of CTOs these questions either. Many got to that position by saying what people want to hear, which is the "average" safe answer.
Agree; this is the same as asking people why they're not having kids: they either a) don't know or b) don't want to / are not willing to say the truth.
Comment by avhception 1 day ago
Comment by shevis 1 day ago
Unrelated to the content of the article, this sentence structure is a dead giveaway of LLM writing.
Comment by jcattle 1 day ago
I'd expect to see a huge increase in "solving real problems" over the last months.
Comment by eamon0989 23 hours ago
Comment by jcattle 2 hours ago
I'm thinking that some of the LLM-isms are a bit more complex than just repeated phrases. It's often more that short, punchy writing style with quick setups and punchlines. But would be interesting nonetheless. I really think that some things (like "solving real problems" or "it's not"/"this isn't") would show up.
Comment by SamuelAdams 1 day ago
It is nice to be able to have a consistent deployment pattern, with traceability, rollback support, and production approval checks. It’s nice to not have some archaic something stuck in someone’s head. It’s also nice to be able to see how something works by reading the code, which is usually up to date and deployable.
Comment by sshine 1 day ago
I’d like to gently push back on that. ;-D
Terraform, when committed to git, provides organisational memory. But less so uniformity, since all providers are different (and you should expect different things when applying). No tracing besides git. And tfstate is hard to share between developers, unlike kube state.
Kubernetes is more the same across providers. And it manages drift after something is applied, which is not a direct argument of OP, but a strong reason over other IAC.
And yes, I also enjoy how well deploying works. And how things generally fit together. Liking the networking complexity less so.
Comment by corpoposter 11 hours ago
> But less so uniformity, since all providers are different
People sometimes misinterpret tools like Terraform supporting different vendors/hyperscalers as it providing a unified abstraction layer above them. As you note, it does not.
I simply fail to understand why automatic drift correction is considered important in this space. Cloud resources do not magically change themselves. Folks often cite rogue engineers making changes, but I prefer to deal with this scenario by whacking people with a stick and/or limiting access. Automatic drift correction can actually complicate making legitimate emergency changes to managed infrastructure.
Comment by simoncion 1 day ago
Really? For years and years we put our tfstate files into private S3 buckets at $DAYJOB and it seemed to work just fine. We didn't even take pains to ensure that everyone was on the very same version of the Terraform CLI. What problems did you guys run into?
Comment by InvertedRhodium 1 day ago
Comment by nat8265639392 14 hours ago
Comment by mikgp 1 day ago
Comment by JohnMakin 1 day ago
Pretty much, almost. Have spent a bunch of time in my career working on the "VM + systemd" setups, stuff running on a rack, or in an ec2 on cloud - managed kubernetes is a lot better for me than those cobbled together messes. There's "easier" setups but usually end up costing me a lot more in time and $.
To answer simply, it became good + convenient. I could complain about plenty, and people here like to, but honestly you couldn't pay me to go back to the old way. The one legitimate gripe is the upgrade schedule is exhausting, on AWS it's about every 6 months before you go into extended support. I also hate being at the mercy of arbitrary decisions like "ok we know a huge chunk of the web going back a decade has architected off our Ingress API, but recently we decided we dont really like that way anymore and we want you to use Gateway API instead, so, um, like ya we know it just killed off one of the most used open source ingress configs (ingress-nginx) but yea trust us bro this is going to be so much better" kind of thing.
Comment by hadlock 1 day ago
I'll admit I'm dreading switching over to the gateway api, but by the time I get forced off ingresses it should be a stable/mature ecosystem. That's still a ways out though.
I don't know anyone still dealing with VMs anymore, except our IT guy who manages a couple of pet servers for random executives from the before times. In the last year k8s has started absorbing executive pet processes and the number of VMs our IT guy manages has dropped by about half.
While I'm here spouting stuff, yeah hiring for k8s is real easy, if our SRE gets hit by a bus, he can be replaced in a week, and we can probably struggle through using opus until that happens. K8s being he lingua franca of git ops IaC makes it real easy for the new guy to parachute in and start working. Every VM thing is going to be totally bespoke and have the personality of the guy who designed it, which is rarely a good thing.
Comment by JohnMakin 1 day ago
I to this date have not seen a viable drop in replacement to how I’ve seen big orgs use the ingress controller stack with the gateway api and what i understand currently is ingate is basically DOA.
Comment by mschuster91 1 day ago
Even on AWS EKS, you will run into bullshit with their network overlay. Egress policies are a mess (at least half a year ago, you were not able to say something like "allow pod A to egress traffic to service (!) B" despite a service resolving down to an IP address in the end.
And that's before going into the unholy mess that is getting connectivity to and from the external world to your cluster. Cloudfront, ACM certificates, ALB, ALB-EKS integration, Route53, Route53-EKS integration, EFS, EFS-EKS integration, EBS, EBS-EKS integration, RDS, RDS-EKS integration, IAM-EKS integration, SSM, SSM-EKS integration, autoscaling... and if you want more pain and don't already wince, try setting that up across regions or, as I had to do once, across account boundaries.
Kubernetes is powerful. But do not make the mistake of assuming it's easy to get started with, at least on the admin side. Even if you got prior AWS experience, getting it all integrated into EKS so you don't have to deal with Terraform and helm/k8s for a full deployment of a piece of software will take you an awful lot of time.
For users though? It's a breeze, I will admit as much. Everything down to the firewall rules can be encoded in k8s spec files.
Comment by JohnMakin 1 day ago
Comment by paulryanrogers 1 day ago
Comment by mschuster91 1 day ago
Oh it's not necessary per se but if you want to host a web service with any sort of state and not having to do stuff in parallel either by hand or by terraform, I'd consider the integrations pretty vital.
It's easy enough (well, it's still addons whose versions you have to keep updated each on their own) once it is set up, but getting to the point where you have something reproducibly running for the first time is annoying as hell.
Comment by zbentley 1 day ago
> do stuff in parallel either by hand or by terraform
…specifically by terraform. Making k8s own the provisioning and management of external infrastructure on principle (as opposed to when that makes sense, e.g. load balancers/gateway/CSI providers) is not a good approach. Sure, it feels unified, but the cost of unification is incredibly not worth it.
Comment by mschuster91 23 hours ago
That's the cost I was talking about. It is indeed annoying and time-consuming to get it set-up once, but once it works... it is amazing for developers to have the ability to spin up a completely identical to prod environment for a hotfix branch to test stuff out, with no involvement from ops or anyone else.
And also, it's much easier IMHO to get a mental image of how a system is constructed when it's one architecture - no matter if it's k8s/helm or Terraform. But as soon as you have both in the mix, you get friction issues, you have to pass stuff from Terraform to Helm or vice versa... and may God have mercy upon you if you also have Ansible in the mess, I had to do that once for a piece of proprietary dependency that would not have been supported by the vendor in any place other than a SLES bare metal server.
Comment by JohnMakin 21 hours ago
I had dismal hopes of it working for very long but it's remained mostly untouched going on 3 years now which really surprised me, and it's been easy to work with. I think if you run EKS resources like node groups, autoscalers, LB type of resources in the same state file as helm deployments you're going to have a very bad time though.
Comment by mschuster91 16 hours ago
There's no alternative to that anyway... otherwise even a terraform apply -refresh=false will quickly take well over 10 minutes.
Comment by JohnMakin 11 hours ago
Comment by kensey 1 day ago
* DC/OS was always its own thing and as time went on, eventually Mesosphere was basically the sole maintainer of the underlying Mesos. Very little external contribution.
* OpenShift was different from Mesos and basically maintained only by Red Hat from the Makara acquisition (sometime in 2010 I think) to about mid-2015 (i.e. the point where they ripped out most of the OpenShift-native process isolation and orchestration and replaced it with Docker and Kubernetes). Pre-Kubernetes OpenShift frankly struggled to catch on and again, basically everybody who cared about developing it worked for one company.
* CoreOS was developing fleet in the open but dropped it outright when Kubernetes was released. The phrase I heard there was "We started to say something and Google finished our sentence." They pivoted to Kubernetes for orchestration so hard it was kind of awkward talking to customers who used fleet after that. In theory somebody could have picked it up like Kinvolk picked up rkt for awhile (and later CoreOS Linux as Flatcar), but as far as I know nobody ever made a serious effort to do so.
* Docker released Docker Swarm shortly after Kubernetes was released -- yet another one-company product. (I still don't really understand why they released Swarm -- for simple workloads, Docker Engine and Docker Compose were enough, and for more complex ones Docker Engine was, at that time, still the sole underlying runtime in Kubernetes. There were already two distinct orchestrators on the market, one from a much larger company with a lot more operational experience running containerized workloads than Docker had. What was their thought process?)
* HashiCorp released Nomad well after Kubernetes but not only was it another sole-corporate-maintainer orchestrator, it deliberately omitted a lot of the basics Kubernetes included like service discovery in an effort to stay simple -- so in very few cases was Nomad alone actually enough to orchestrate workloads (nor was it intended to be, as the Nomad engineers in the ~1.0 days would have been first to tell you). Past a point this made Nomad more work to get running and keep running than Kubernetes was.
The flip side is, I don't think a purely community-developed orchestrator would have won, even with a foundation backing it. It's not the corporate backing that's the issue, it's the lack of diversity in that corporate backing.
Comment by Glyptodon 1 day ago
Comment by ghaff 1 day ago
Comment by soco 1 day ago
Comment by reillyse 1 day ago
Comment by phailhaus 1 day ago
Comment by kakwa_ 1 day ago
Given the number of moving parts, I would be terrified to have to look under the hood of what Talos deployed for me.
Comment by reillyse 1 day ago
Comment by __turbobrew__ 1 day ago
Comment by lawn 1 day ago
Comment by chaos_emergent 1 day ago
Comment by aaronbrethorst 1 day ago
Exactly why I hate CloudFormation, K8S, GitHub Actions, etc. yaml is a terrible format for the knowledge encoded in these artifacts.
Comment by rienbdj 1 day ago
Comment by aweiland 22 hours ago
Comment by WorldMaker 16 hours ago
Comment by nitwit005 1 day ago
My current company makes this claim, but it's not true. They also have serverless apps, and also have some services running directly on EC2.
They just think of the Kubernetes deployments as the "standard" way.
> Second was shared, hireable knowledge. K8s is basically a lingua franca now.
People were demanding experience with Kubernetes, long before it was reasonable to expect it. Everyone added it to their resume, because they had to.
Comment by ralferoo 1 day ago
That said, the app I'm developing in my startup is designed with scalability from the outset. I have a single setup script for each type of node, and can take a fresh ubuntu install and just "wget -O- $URL | sh" as root and it sets up the node from scratch and/or reconfigures it to the latest configuration. That does all the ancillary stuff, setting up sane firewall defaults, blocking SSH from non-whitelisted IPs, setting up NTP, borg backup and zabbix (both require manual work on the respective backing servers currently), setting sane system configs (e.g. systemd logs limited to 100MB), wireguard (for the backends that distribute sqlite databases using litefs), etc. and installing the relevant packages with my software.
The actual backend application is built into a debian package automatically, so it's just a case of adding my private repository to apt sources and installing it. Updating a machine is just "ssh root@$MACHINE 'apt-get update ; apt-get install $APP'". I probably could automate that with ansible, but I prefer to upgrade them piecemeal while I'm testing out an upgrade, so I have a couple of bash scripts that do the ssh in a for loop instead with different targets in each.
This has the advantage for me of being able to buy any old VPS from a cheap provider and add it to my pool in minutes.
I'm sure I could end up with something that's just as easy to update with kubernetes, but it seems like another big learning curve with dependencies that probably change every few months and require me to keep learning new things just to keep it running. I understand my bash scripts, and know they won't just stop working going forwards (modulo exceptional events like having to migrate to systemd scripts, but that kind of change is usually only required on a very few major OS distribution upgrades).
I already have enough pain from some of my tech depending on other people's projects (I have a frontend app written in Flutter, and forced SDK upgrades about every 6 months and then resulting issues with toolchains I haven't even chosen to use, like gradle and kotlin, that seem to break everything every release), that I have no great desire to rebuild everything on someone else's deployment framework. When I get to the point of hiring others to help, I'd hope they'd be clued in enough to understand a simple bash script that sets up everything, and logically follow it through.
Comment by mianos 1 day ago
To use it is a whole different question, and not in any way related to job interviews. I have worked in places that are crazy for not using it and others where using it was even crazier.
Comment by jessinra98 23 hours ago
Comment by andrewcamel 1 day ago
I personally will be using more resource efficient approaches in everything I do. Question is just what provides the closest set of benefits without the full k8s weight.
Comment by lanycrost 23 hours ago
Comment by xylon 22 hours ago
Comment by esbeeb 22 hours ago
Comment by hanneshdc 1 day ago
It gives most of the benefits the author mentions (traceability of changes, clearly written down infrastructure), without the complexity of k8s.
Comment by h4kunamata 1 day ago
I worked once at a bankm fully kubernetes, the amount of problems were out of reality from this world.
Complexities are being added for no reason at all.
Comment by SJC_Hacker 1 day ago
Comment by kh_hk 16 hours ago
Comment by johnsmith1840 1 day ago
That makes it a no brainer for me for basically any sized project.
Small project? -> minikube single node deploy it.
Tiny project? -> minimum a docker container
I cringe watching anyone build and run code on a raw machine even locally without atleast a container. The endless hours of headaches you avoid is obvious k8s is just the natural extension from this.
Comment by siliconc0w 1 day ago
I ended up in a different non-SRE role but if you're interested in working on it, please let me know and I'd love to walk you through it.
Comment by metaltyphoon 1 day ago
Comment by mattmatters 1 day ago
K8s is a complicated beast. CTOs hiring for their 10 person company because of its "used everywhere" is a bad reason to adopt a major piece of technology. You can always graduate to it later if need be.
Comment by jbnorth 1 day ago
It removes the overhead of a lot of what sysadmins and devs of yesteryear did by hand or had to have a career's worth of experience to do quickly.
That's not to say that people don't need to know what they're getting into when they adopt kubernetes but especially when you're using a managed offering and not on the bleeding edge of what it supports it's pretty easy in terms of overhead and maintenance.
Comment by WorldMaker 16 hours ago
Comment by imglorp 22 hours ago
So it's the same motivation as enterprise java.
Comment by TZubiri 22 hours ago
I feel the motivation goes a bit in the opposite direction. Yes a commoditized worker who speaks a common k8s language, but not too common a language, there has to be some selection here, a talent pool of 100K engineers that know linux is too big, but a pool of 5k engineers that know k8s apparently is just right.
Gotta filter those resumes somehow!
Comment by ritcgab 1 day ago
Comment by stego-tech 1 day ago
Right now, I’m one dinosaur managing a startup’s tech portfolio. Everything lives in my head first, then in my break-glass vault for addressing the bus problem. Our public cloud footprint is a single KMS for backups. We have no VMs, everything is a cloud service.
The literal fucking second we have real infrastructure requirements for compute, it’s right to GCE. No ifs, ands, or buts. Here’s our Git Repo, here’s the managed K8s control plane, make it work.
If (or when) we need on-prem compute, we add them to the K8s control plane as worker nodes and taint accordingly.
It’s just so much more interchangeable, even if the learning curve for non-SDEs can be a little steeper than VMs.
Comment by zug_zug 1 day ago
There's a certain type of engineer (maybe 25% of them) who does "hype-driven-development." No matter the technology, they are huge advocates for the technology. The hype may be absolutely real, complete nonsense (e.g. mongodb), or somewhere in between (ai). The vast majority of the time it's hype for a new technology that feels 90% the same from the end-user perspective (react vs vue, docker vs colima, go vs other, whatever vs whatever).
These engineers though, only care about something when it's new and trendy enough to be a differentiator. This is because they don't give any hoots about the actual usefulness of anything, they are just trying to differentiate themselves in a market by leveraging vibes rather than raw competence. I think these types of engineer drove kubernetes for companies that don't need it, but tipped the scales enough that it has critical mass.
The irony being kubernetes is way too heavy/clumsy an abstraction for most companies. The savings of packing pods onto the same node is usually a tiny fraction of the engineers' salaries who are managing it.
The other irony is now that kubernetes isn't the new sexy thing, but a standard tool that AI or a normie can do all the hard work for, the hype driven engineers are off looking for the next thing.
Comment by gib444 1 day ago
Who is going to criticise the engineer who is open to new things / appears innovative / loves MVPs / 110% supportive of the latest bs the CTO is spewing? Yeah
Comment by zbentley 1 day ago
Comment by zug_zug 1 day ago
And I do think there is a way to use kubernetes with minimal damage, but it requires making firm rules about not focusing on things that aren't needed yet (e.g. istio) and making firm hiring choices about only people who understand that such optimizations are complete wastes of time for a series A startup.
Comment by dzonga 1 day ago
Comment by vasco 1 day ago
Their identified reasons are OK though.
Comment by crefiz 1 day ago
Comment by globular-toast 1 day ago
Comment by raesene9 1 day ago
Comment by louwrentius 1 day ago
I think what you hear is never the whole story, there is much more going on.
Comment by FpUser 1 day ago