Solved: Pros/Cons of Build Time vs Run time deployment options for Application-Only OpenShift container monitoring?

jeff_rowell · ‎13 Dec 2019

Is there any documentation available that describes the pros and cons of the Build Time vs Run time deployment options for Application-Only OpenShift container monitoring?

I recall having come across some older information that indicated that one drawback with the Build time approach is that it would be necessary to keep track of agent versioning and perform rebuilds when new versions came out (to maintain currency), whereass with the Run time approach currency is automatic. If I understand correctly, however, there is now a OneAgent Operator capability that allows for the automation of the agent update and roll-out of new versions. If I understand what is said in the documentation correctly, therefore, it would appear that the one "con" of the Build Time approach is removed via use of this operator. Is that correct?

More generally, what are the current pros/cons of the two approaches?

We have been performing integrations using the Run time deployment, but one of our OpenShift operations team members has expressed concern that there is a "risk of container failure at initialisation when the DT_API_URL is unavailable to respond to oneAgent download requests." So, for example, if the DT server was down or not resposibve enought, or if there were network issues and the agent could not be downloaded the containers would fail to initialize... according to the OpenShift team this would result in "either a restart or a shutdown of the application system." I am not sure whether there is anyway to avoid this issue with the Run time approach (?). With the Build Time approach (and with use of the Dynatrace Operator) I am not sure whether there would be any similar operational impacts to the containers in the event that the agent could not communicate to the cluster (?).

Enrico_F · ‎13 Dec 2019

In our case one specific drawback we had with build-time integration is that it made quick disabling of the agent clumsier than with runtime deployment. This is due to the fact the with the official build time integration method the container entrypoint is overwritten with the Dynatrace-provided script under /opt/dynatrace/oneagent/dynatrace-agent64.sh which sets all required environment variables including LD_PRELOAD in the process context only. This means you can't simply remove LD_PRELOAD from the deploymentconfig and start a new deployment to disable the agent... I'm sure there are ways around that but this is something that we hadn't thought about initially and caused unnecessary delay in removing the agent from a productive system that was impacted by it...

With the runtime integration we can simply remove the LD_PRELOAD from the DC and redeploy the pod.

Enrico_F · ‎13 Dec 2019

PS: Due to the above drawback we are mostly not using the official suggested way for build-time integration based on the dynatrace-agent64.sh but instead simply install the OneAgent into the base image using the default path (opt/dynatrace/oneagent) and pushing the oneagent-enabled image to an output image stream and enable it by setting LD_PRELOAD in the deploymentconfig accordingly.

We run OneAgent-enabled builds for all standard base images after each cluster update tagging the output image streams accordingly.

jeff_rowell · ‎13 Dec 2019

Thank you for your response. The one point I'm not clear on is why you would need to disable the agent in the Runtime deployment. I.e., since there is no need to reach out to the DT server to obtain the agent at container startup, what situations would result in a need to disable it (other than discovering some sort of application issues caused by the agent)?

mreider · ‎13 Dec 2019

Hi there. I joined Dynatrace a few months ago, and this is my very first post in the community forum. Very exciting. First I will compare run time with build time. Then talk about the operator. Finally end with a little tid bit.

Run time integration

Pros

The docker container image will be a little smaller, since the agent is downloaded later.
The docker container will always get the latest bits (if you specify 'latest' in the URL string)

Cons

If there's a network failure, a proxy issue, or something that makes it impossible to download the agent, your app won't start.
If you do not use rolling updates [example] this could have severe consequences. If you use rolling updates, and have a strategy that does not continue a deploy until the 'last' container is running, then there's really no problem.

Build time integration

Pros

You avoid this network risk. But, again, you can avoid it using deployment strategies and keep using run time integrations.
Your docker container is immutable - it never changes. This can be a security requirement in certain organizations depending on how they manage docker images / repositories.

Cons

The docker container will be a little larger
If you want to update the agent, you need to rebuild the docker image

OneAgent Operator

You are right that the OneAgent Operator takes this burden away from you. It also does this in a more elegant way by installing the agent on the Kubernetes node, rather than in the container. This is one of the major advantages of Dynatrace Kubernetes monitoring - in that you can correlate node events / problems with application issues. By nature, this is a full-stack instrumentation, so if you are limited to application-only monitoring, you must stick with the docker container integration we discussed above 🙂

Tid bit:

It is possible to change the runtime integration script so it exits gracefully in case of a network problem. This would have a similar result to a rolling update strategy, in that your app would stay running and healthy, but it has a major downside - after the deployment you would not have any monitoring since the agents never installed. I am also unclear if we would support this model, but I am happy to look into the complexities there if you're interested.

My first post! Yay! Hope this answers your question 🙂

-M

Kubernetes beatings will continue until morale improves

jeff_rowell · ‎13 Dec 2019

Thank you very much for your detailed response. I found it very helpful. There is one point that I am not totally clear on though. It seems clear from your response that the OneAgent Operator requires the agent to be installed on the cluster node... this would result in Full Stack monitoring would it not? We only want to deploy in Application Only mode (largely due to licensing issues related to cluster node memory size). So I am not totally clear whether we can use the OneAgent Operator to achieve Application Only monitoring of specific containers.

mreider · ‎13 Dec 2019

That's right. If you are constrained by licensing issues, the OneAgent operator is out of reach. I would go with the rolling update strategy if I were you. Best of both worlds. Small container image. Latest agent. And you're safe if there's an outage 🙂

Kubernetes beatings will continue until morale improves