RFC: Optimizing Instance Deployment and Maintenance

Hi all,

Looking for general comments on this first draft of the Optimizing Instance Deployment and Maintenance discovery doc. It proposes a reasonably huge paradigm shift for OCIM, so I recommend that everybody get in and comment. I’m particularly interested in getting reactions from our devops specialists, @giovannicimolin and @toxinu, but also folks with a lot of OCIM experience such as @guruprasad, @kshitij, and of course @braden and @antoviaque. (Braden, I’m particularly interested in your recent Terraform experience: any Kubernetes mixed in there, by any chance?)

By the way, @gabriel and @fox, this is also my response to SE-4049, so you’ll probably want to pitch in as well.

Ticket to log time: SE-3893

6 Likes

@adolfo Thank you for this discovery - I have done a pass on the document.

This looks exciting! :slight_smile: As you will read from my comments though, I think we’ll need to be careful to properly validate it first - with DevOps more than with almost anything else, the devil truly is in the details…

NB: About keeping the discussions public: I’ve made the linked document public (it was still limited to the OpenCraft team). Also, for our accelerated projects, we have decided to move discoveries to gitlab merge requests, and to gitlab tickets instead of jira tickets for ticket discussions. No need to change it for this time, it’s too late, but can you do this going forward?

There have been many attempts in the past, with different technologies, for instance SE-132 included using Packer to create images that can later be deployed.
I’m happy that tutor solves some of the past questions we had, like how to separate assets (like themes) that can’t be included in a base image because they vary from client to client.
I don’t know enough about it to say whether it can handle all customizations (e.g. if a clients needs an extra table). I left some comments there.

I left a comment at SE-4049 about whether this proposal will reduce maintenance costs, and which ones and by how much.

Ocim requires changes if we transform it into an orchestrator of container technologies. We’d need to estimate which changes.
I think we also need several proofs-of-concept, like for instance a deployment that happens in two steps: the first one to build the common stuff (including migrations), and the second one to add customizations on top.
Then the first step is the one that will be built into the image.

Some custom Python packages our customers install include their own migrations. How would we handle these?

We’d have to separate the common migrations which are in all clients (these go into the image or the DB dump), and then the custom migrations that need to happen on top.

Another issue is that the database doesn’t live inside the image, but in an external DB server. Image and DB server need to by synchronized.

There has been brainstorming and past discussions about these topics, there may have been conclusions for some of the difficult parts too.