Hi-
I have a customer (vegas.com) who needs to filter out BOT traffic so it doesn't capture meaningless information that consumes all their paid for capture. I know we can filter out by IP ranges but what about filtering by user agent from tealeaf as something like this?
Drop28=reqfield HTTP_USER_AGENT contains crawl
Drop29=reqfield HTTP_USER_AGENT contains bot
Drop30=reqfield HTTP_USER_AGENT contains slurp
Drop31=reqfield HTTP_USER_AGENT contains spider
Drop32=reqfield HTTP_USER_AGENT contains jeeves
Drop33=reqfield HTTP_USER_AGENT contains Sleuth
Drop34=reqfield HTTP_USER_AGENT contains Nikto
Drop35=reqfield HTTP_USER_AGENT contains Nessus
Drop36=reqfield HTTP_USER_AGENT contains Heritrix
Drop37=reqfield HTTP_USER_AGENT contains IPCheck
can we filter those out so they don't get captured somehow?
thank you
~Brendan
Answer by Jesper G. ·
Also I think that dealing with bots can go a little bit deeper.
For example in the Errors view the bot traffic tend to generate alot of 404s since they are trying to index pages that are not valid any more leading to a "background noice" that can drown-out real problems that might occur and which doesn't generate enough of a signal to rise above bot-traffic.
One way to handle it would be to configure the error categories similar to "HTTP 4XX Response (internal)" with an "exclude bot" attribute or have user agent mapping rules able to configured, such that UA strings with "GoogleBot" could be excluded.
It's a bit separate so I'll make a separate RFE on that.
Answer by Reinhard W. ·
Brendan,
I assume bots do not eat up visit volume as they usually do not interpret JS...
That said I could imagine someone could hack something together to use the already existing "exclude" browser/versions in the UEM config.
I'm thinking of replacing the user Agent string on the webserver side (Apache mod_header conditional config) to set it to something very exotic (like Opera version 0.1) and then configure the UEM restriction to exclude that browser.
I also tried to set the do-not-track header on Apache that way, which basically works if the signal handling is completely done on the Java/.Net backend and not on the webserver (due to the order of apache modules beeing processed).
Reinhard
Hi Reinhard,
Some bots like Google eat up license volume as they execute JavaScript!
Klaus
Another use case would be in test. Customer has a test environment with a small UEM volume in order to test their app with UEM, ensuring usernames are captured and so on. Selenium tests running on that environment should not use up the UEM volume...
Right now the only workaround is to filter out IPs that Selenium is using and hoping they don't suddenly change.
Best, Roman
JANUARY 15, 3:00 PM GMT / 10:00 AM ET