A question has been raised about the future of haproxy-dev.opencraft.hosting
which has been running in OVH region GRA1 for more than 4 years and 3 months.
I can find references to it as far back as 2018 in OC-3913, and it was used as part of “Hosting Reliability v2” epic SE-934. As far as I can tell it was just used as part of the initial haproxy testing and setup.
Do we still need it? If so, we should be maintaining and documenting it properly. If not, it should be retired.
Does anyone have any opinions about this? Would anyone’s future workflows be disrupted if haproxy-dev were retired?
@kahlil, afaik, it is not being used at all. So I’m okay if it is properly retired and removed from all the systems that refer to it. I am also okay with it if we set it up as dev load balancer server and start using it
I think you are right @antoviaque. All the “private” stuff is hidden behind the linked content and the old security practice of keeping all infrastructure details private is not as relevant these days.
It has been used just for tests, especially in cases where testing wouldn’t be appropriate for haproxy stage. For instance OC-3913 required restarting haproxy many times, killing it, changing its configuration, … If we do those tests in Ocim stage, many stage servers could stop working, therefore it was good to have an unrelated one.
But we haven’t needed it for years and it’s probably out of date so it can be deleted. I can’t even connect through SSH (haproxy-dev.opencraft.hosting, port 22 closed).
I guess they aren’t all related to haproxy-dev, and they need to be evaluated separately.
How hard was it to set up? Maybe we could add a “How to build a development zone” note to the ha-proxy docs?
Would creating a more general “infrastructure development zone” based on this be useful? (I’m thinking of follow up tickets). We had some recent issues with NewRelic, OpsGennie and monitoring. Such a zone might be useful for testing NewRelic, OpsGennie, Grafana, Promotheis, ELK, Consul, TLS configuration. We could simulate outages without upsetting the rest of the infrastructure and send all alerts to dedicated channels. Servers could be switched off but left in place to save costs. (I assume we can do that with OVH?)
Good point! I’ve create a draft ticket FAL-753 to chase those down
I don’t know since I already found it set up. I guess it’s „just“ running our ansible playbook. But even that requires the virtualenvs, the right commands, some variables, a blank server or docker VM, …
Our standard documentation about how to run playbooks should be enough. (However you may not have access to that repository yet).
It’s probably not worth it paying for many extra VMs just in case we need to simulate something; it’s also a lot of coordination to document what is each server, who’s testing it, does anything important depend on it, …
But we may do something like ready-to-use images of test servers. Maybe even with Vagrant, since it’s easy to store the whole status as a file and then share the file.
You can ask around for more ideas. Maybe you can even relate it to some container epic, because we also want to deploy ready-to-use edxapps.
I don’t think that haproxy dev even works anymore, it uses our outdated haproxy setup. The current ocim version won’t work properly on it since it relies on Consul and cert-manager instances.
I’m not eve sure we’re using that setup for something, but if we are, I’m in favor of nuking the current dev account and starting from scratch (deploy a clean dev env, update opencraft-im.env, etc).
I think that most of these instances where created by Ocim devstacks to test some aspect of provisioning. Once we check that this is the case, we can:
Delete them, since they can be easily re-created.
Add some prefix to the Ocim defaults to signal which instances where created by the Ocim devstack.
Everyone’s had a chance to view and reply and it looks like the consensus is to retire everything in the old dev account (including haproxy-dev and rabbitmq-dev-2) provided we can verify that the 23 platform instances were created by Ocim devstacks (how to do that?) or a just old an unused. And that the easiest and cleanest way to terminate all the instances is just to nuke the account. I’ll create a task to schedule that investigation and action. I’ve repurposed FAL-753 to do this.
Any future infrastructure testing environments can be spun up as and when needed using the appropriate technology, which will probably be container based and will probably arise naturally with some of the planned developments around Ocim.
There is the separate issue of how and if we want Ocim devstacks to be able to spawn instances for testing, and how we want to manage that. I’ll create another ticket and discussion I’ve created FAL-9811 to track that.