I know this question has been asked before here:
https://answers.dynatrace.com/questions/118885/ser...
However, I am new to this product and need a little more direction.
Here are the points from the previous post answers with my question/comments.
@Ulf Thornander – “You should also check your "Server IP Range". This is done by going to the advanced properties and changing those HTTP://YOURSERVER.COM/ATSCON and selecting "Advanced Properties Editor" and then specifying the "Accepted Server IP address range".” GJ – There is no "Server IP Range” under "Advanced Properties Editor". Besides, according to the test reports I was building, there has been no more than 11,000 rows.
============
@Pawel Brzoska – “Server limit also includes distinct urls, so that explains why it happened after reconfiguration of url parameter. Most likely there are too many values of this parameter resulting in thousands different urls populating the database. 50k is a lot, so i dont advise increasing this limit, rather take a look at how dynamic is the parameter you configured and tweak its definition to result in more aggressive aggregation of parameter values to single operations.” GJ – Where do I find this parameter you mention?
============
@Adam Piotrowicz – “Please go to http://<CAS>/modulestatus?advanced=1 page of your CAS and copy/paste here rows with Module name column set to "Advanced DB Statistics".” GJ –
@Adam Piotrowicz. - "Indeed this script is not answer which URLs are filling server cache (that is the most often reason) but only a direction which Software Services have the most sessions (that is usually related to number of URLs) so we know which one should be investigated in DMI by making very simple report with Software Service and Operation dimensions and Operations metric."
GJ - I added Server name and Server IP address to the report, hoping to increase the number of servers discovered.
Report: Software Service, Operation, Server name and Server IP address dimensions and Operations metric = 9,181 rows
"Advanced DB Statistics" URL's on servers = 100,372
============
What am I not understanding here?
Thanks in advance for any guidance you can provide.
God Bless,
Genesius
@Adam Piotrowicz
Product Name: Central Analysis Server Version: 12.3.3.29
Thanks and God bless,
Genesius
Hi Genesius.
I'm wondering about your statement:
GJ - I added Server name and Server IP address to the report, hoping to increase the number of servers discovered.
Do you want to see more servers?
Are you using DCRUM as a Discovery tool to find out what is communicating on your network?
It "can" work in such a manner but it's not it's primary function. As Adam is eluding to, more than just the server IP address is at play when the "The Server Cache Limit" is reached. With that in mind, please understand that having the warning is NOT a desired state as it can create incoherent data, and the problem should be solved as quickly as possible.
However, if you still want to examine and analyse the network and you bump into the "Server Cache Limit" ever so often, I'd consider expanding into a cluster and utilize a second and/or a third CAS, depending on the nature of your traffic.
Hi Ulf,
I want to "increase the number of servers discovered" so I can determine why we have to continually increase the cache. My understanding is we were at 20,000 a month or so back (before I arrived). Now we went from 80,000 (after I arrived) to setting to 200,000 so the error would disappear. I don't want to place a bandage on the problem, but solve it. Therefore, I want to see ALL the servers that Advanced DB Statistics indicate exist.
Thanks and God bless,
Genesius
Answer by Genesius J. ·
I ran the script you provided and received the following error.
Msg 208, Level 16, State 1, Line 1 Invalid object name 'rtmsession'.
When I perform a search in the CAS database for "rtmsession" the table is present.
Note: I am using MSSQL Server Management Studio to run.
I also attempted running "parts" of the script and received the same error.
Thanks and God bless,
Genesius
PS I will be out on Monday and not able to reply.
Genesius,
It sounds like you're connected to the SQL Studio using other user than delta and I admit the script is not ready for that.
Please find cache-troubleshoot-fixed.txt that could be run on any user.
Try it and let us know.
Answer by Adam P. ·
OK, being on SP > 2 keeps you from many problems with server cache. Then we can try to sum up our experience about server cache thing.
In general server cache keeps all the servers and URLs that CAS saw during last (by default) 10 days. So the reasons for it to grow would be either too "wide" monitoring (you monitor everything) or monitoring servers that provides you with many unique URLs having i.e. number of session in the operation name and each operation is unique and will never re-appear.
Usually we start troubleshooting by identifying top Software Services that generates most URLs and servers. We use this SQL script to identify Software Services that the then troubleshoot in DMI: if operations are very unique, how many servers are reported, etc., and then we re-visit the configuration to find a way to aggregate the data. If that is not possible we ask customers to limit the traffic or introduce load-balancing on CAS.
When server cache limit is exceeded, several things happen:
When a new data file (5-min zdata package) with new servers/URLs is processed by the CAS and cache limit will be exceeded in the middle of it - servers are processed first, then URLs are processed, so chances are that servers will be recorded and some (randomly chosen) URLs may be ignored.
Summarizing, if you have an existing server already known to CAS and you expanded number of URLs monitored to the extent that cache has been exceeded - you should still be getting complete measurements for the server level and complete measurements for some URLs (those that have been known to the CAS already plus, perhaps, some of the newly added URLs). Remaining URLs data will be rolled up into All Other Operations.
Again, if you have a server cache limit exceeded on < 12.3.3 it's very likely that you are not monitoring to much data but you experience a bug in DC RUM that is not correctly ages data.
Please let us know if any aspects of this subject are unclear and we will happy to reply.
JANUARY 15, 3:00 PM GMT / 10:00 AM ET
CAS Scalability Suggestions for Intranet Apps having total of 180K Users
Can we manually filter what data that get fed into ADS?
Server is currently incapable of changing capacity settings due to memory / performance problems
Is it possible to exclude a set of software services in CAS Advanced Properties Editor?