cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Combination of maintenance windows: best way to deal with it?

AntonioSousa
DynaMight Guru
DynaMight Guru

I have a case where I have to define maintenance windows for a large number of hosts. The combination can be quite huge, like the following for a specific day:

  1. 00:00 - 02:00
  2. 00:00 - 03:00
  3. 00:00 - 04:00
  4. 00:00 - 05:00
  5. 00:00 - 06:00
  6. 02:00 - 04:00
  7. 02:00 - 05:00
  8. 02:00 - 06:00
  9.   ... the list goes on

Now, it has occurred to me if I define 6 maintenance windows,

  • 00:00 - 01:00
  • 01:00 - 02:00
  • 02:00 - 03:00
  • 03:00 - 04:00
  • 04:00 - 05:00
  • 05:00 - 06:00

instead of having combinations of maintenance windows, I could have only six, and then for each host attribute several sequential maintenance windows.

Doing this at scale might have some issues. There is a limit of 2000 maintenance windows per monitoring environment (link) so this favors the second approach. But I'm not sure I'll not  hit some other limit? Has anyone done this at scale?

BTW, you might be asking, why this huge combinations? Patching after the second Tuesday of the month...

Antonio Sousa
3 REPLIES 3

ct_27
DynaMight Pro
DynaMight Pro

"Patching after the second Tuesday of the month...".   After 2 years i stopped waiting on DT for a solution.  I'm running an instance of Node-Red and using the cron scheduler node to schedule such maintenance windows.  When the moment hits, the hostID is passed into an API call that creates a 3 hour maintenance window in realtime.  The script then runs at the end of the day to cleanup any maintenance windows with the name 'Automated Schedule' that are [Expired]

I have 500 Maintenance Windows. Of them 150 are pre-defined as below as recurring on the given day of the week and start time. They each run for 3 hours.

[Upcoming] z:  - MW-Monday-01

[Upcoming] z:  - MW-Monday-02

[Upcoming] z:  - MW-Monday-03

...

[Upcoming] z:  - MW-Sunday-23

We then put a property on every machine that matches the name of the maintenance window that aligns with the patch cycle. An automatic rule bring that property up to a tag key:value. The Maintenance windows include any entity with that tag on it   [MaintenanceMode]xxxxxxxxx:MW-Wednesday-22

The above solution takes care of putting machines automatically in and out of maintenance mode as they enter and exit their patch cycles.  Been running this solution for almost 2 years now and it cut down our problem tickets tremendously and stopped waking people up at 3am.

------------------------

Next, we have another solution for another set of hosts.  These hosts are scheduled for patch via Cron.  A special script creates a daily schedule of hosts and their scheduled maintenance. It generates a CSV file that contains the schedule for every day in the current and next month.  A Node-Red script reads in this file at 12:01 every morning.  It clears out the prior day and current days maintenance windows, then loads in a fresh set for today and the next day.  

We did this because otherwise we too were hitting the MaintenanceWindow limiter.  

In addition, it creates a tag on each host named NextPatch and puts the datetime of the next scheduled patch. Users love this feature to lookup the next scheduled patch date without bugging our system admins.  Even they use the tags for reference.

PM me if you want to see the scripts.

--------

Finally we have created a single maintenance window that includes any entity with the tag MM_ON.  The window started sometime in 2021 and expires in 2030.  We then manually or through any external scripts, apply an MM_ON tag to a host and all the Services, Process Groups, and PG Instances go into maintenance (we even figured out how to get OS Service Monitors too). Same for synthetics getting turned off.

 

HigherEd

@ct_27,

Thanks for the great summary! The NextPatch tag idea is also great!

I'll have a deeper look into this. My situation is a little bit trickier, as maintenance windows are very variable. I'm trying to look also at how to simplify it.

Antonio Sousa

see PM @ct_27 🙂

Kind regards, Frans Stekelenburg                 Certified Dynatrace Associate | measure.works, Dynatrace Partner

Featured Posts