solarwinds  |  thwack
in
Search 44,775 posts and 623 resources contributed by 21,549 members or post a topic.

Already Joined? Sign in
Alert Suppression

rated by 0 users
Answered (Verified) This post has 1 verified answer | 46 Replies | 18 Followers


21 Posts
Points 48
clint posted on 06-21-2007 5:18 PM
rated by 0 users

Hello all,

I am still confused by Alert Suppression. Let me start with a simple scenario. I have the following hypothetical setup:

Orion > RouterA>RouterB>SwitchA>25 Servers connected to the switch

 

I want to monitor the devices RouterA, RouterB, SwitchA, and each of the servers connected to the switch. So that means I will add each of these 28 devices to Orion System Manager. Lets say I have just installed Orion and have not modified the canned alerts but I do have "Page me when a node goes down" and "Page me when an interface goes down" activated and working correctly. So at this point if a server goes down I get a page for the server (node) going down and the switch interface that the server was attached to goes down so I am paged again. So I know both alters are working correctly.

Now lets say Router A fails. How do I set up suppression so I don't get a page for RouterB going down, SwitchA going down, and all 25 servers going down. Do I have to create seperate alerts for all 27 devices and suppressions for all 27 devices?

To make it even more simple lets say SwitchA fails. How do I set up suppression so that I don't get an alert from all the 25 server nodes that I am monitoring. Do I have to set up an alert for each of the 25 server nodes and add a suppression that if switch 25 goes down don't page? I hope not! That would be very painful.

I can't seem to find any documentation or tutorials that shows the logic behind suppressions. Solar Winds does tell me how to create a suppression but does a very poor job at telling how this is put into action in the real world. I have not found any  posts on this forum that are answered very well either.

Solar Winds are you listening....

 

 

Answered (Verified) Verified Answer


415 Posts
Points 2,023
Answered (Verified) vhcato replied on 06-21-2007 7:45 PM
Clint,The main thing on which to focus is where the dependencies lie. In your scenario:All 25 servers depend on SwitchA, RouterB, and RouterA.SwitchA depends on RouterB and RouterA.RouterB depends on RouterA.

What you will need to do is build a separate alert for each level of dependency. In your example, this would mean creating 3 alerts, each configured to suppress triggering, depending on the state of the devices on which they are dependent. For example, the suppression for a single alert that covers all 25 servers would look similar to the following. (This is just a quickie explanation, but if you need the detail of how it really looks in the alert config window, I can put that together tomorrow… Maybe…)  :)

Suppress when:

NodeName equal to SwitchA OR

NodeName equal to RouterB OR

NodeName equal to RouterA

AND

Node Status not equal to Up

The reason I use "Node Status not equal to Up" instead of "Node Status equal to Down", is due to the fact that depending on where you are in the polling cycle, the routers or switch may still be in a "Warning" state when the servers are finally noted as "Down".

If your primary concern is to suppress the individual server alerts when one of the three network infrastructure devices goes down, then this one alert will suffice. Naturally, if you want to suppress further up the line, then simply follow the same thought process.

I hope this gives you something to go on. If not, too bad... :) No, really, I'll try to come up with something more specific tomorrow.

Vic

Vic Cato
Technical Lead - Network Operations
Koch Business Solutions - Enterprise Technical Services

  • | Post Points: 5

All Replies


415 Posts
Points 2,023
Answered (Verified) vhcato replied on 06-21-2007 7:45 PM
Clint,The main thing on which to focus is where the dependencies lie. In your scenario:All 25 servers depend on SwitchA, RouterB, and RouterA.SwitchA depends on RouterB and RouterA.RouterB depends on RouterA.

What you will need to do is build a separate alert for each level of dependency. In your example, this would mean creating 3 alerts, each configured to suppress triggering, depending on the state of the devices on which they are dependent. For example, the suppression for a single alert that covers all 25 servers would look similar to the following. (This is just a quickie explanation, but if you need the detail of how it really looks in the alert config window, I can put that together tomorrow… Maybe…)  :)

Suppress when:

NodeName equal to SwitchA OR

NodeName equal to RouterB OR

NodeName equal to RouterA

AND

Node Status not equal to Up

The reason I use "Node Status not equal to Up" instead of "Node Status equal to Down", is due to the fact that depending on where you are in the polling cycle, the routers or switch may still be in a "Warning" state when the servers are finally noted as "Down".

If your primary concern is to suppress the individual server alerts when one of the three network infrastructure devices goes down, then this one alert will suffice. Naturally, if you want to suppress further up the line, then simply follow the same thought process.

I hope this gives you something to go on. If not, too bad... :) No, really, I'll try to come up with something more specific tomorrow.

Vic

Vic Cato
Technical Lead - Network Operations
Koch Business Solutions - Enterprise Technical Services

  • | Post Points: 5

415 Posts
Points 2,023
vhcato replied on 06-22-2007 7:03 PM
rated by 0 users
Clint, I noticed your question was marked as "answered", but if you still want me to post more detail, just let me know. :)

Vic Cato
Technical Lead - Network Operations
Koch Business Solutions - Enterprise Technical Services

  • | Post Points: 3

21 Posts
Points 48
clint replied on 06-25-2007 1:52 PM
rated by 0 users

Thank you.

 I have played with the alert suppressions of a "node going down" and think I have this manual process down although the product seems kind of brain dead in this area. There must be a better method to setup suppressions. You must agree that creating a new alert for each network device (Switch, Router) then adding the dependent upstream network devices to the suppressions tab is cumbersome. Maybe this can be improved by adding the features found in SW other products to the Orion discovery. The SW Tool set has port mapper and trace route so why can't "DEPENDENCIES" be added to the discovery process using these tools...  (forward that one to the SW developers).

 

Also, Is it necessary to suppress alerts when an INTERFACE goes down? I am asking because the status of the interface is determined by an SNMP get. If the poller cannot get a response from an SNMP get the poller marks the interface as unknown and does not alert. If an upstream neighbor is completely down (power failure, catastrophic failure) then the SNMP get would never get to the downstream neighbor and the Orion would mark the interface as UNKNOWN and never fire an alert. ** So using our example, we are monitoring switch port, lets say 10, on SwitchA and Orion Reports the port UP. RouterB then dies and we lose connectivity to SwitchA. The poller cannot receive the reply from the SNMP get checking the status of SwitchA Port10 from switchA because routerB is dead. Is the "interface down" alert ever fired for SwitchA since the SNMP get was never received by the poller?

 

 

  • | Post Points: 3

415 Posts
Points 2,023
vhcato replied on 06-26-2007 7:26 AM
rated by 0 users

I agree that manually configuring alert suppression, especially in larger environments, is cumbersome at best. On the other hand, I also have a basic understanding of the potential complexities involved with building an intelligent dependency discovery and tracking mechanism. Maybe someday this will be included in Orion, but I wouldn't look for it any time soon. Remember, it's only relatively recently that we've begun to see some of what I would call more basic features included in the product, such as syslog / trap handling and custom mib support, and I feel like there is still a lot of work to be done in those areas to make those components more flexible and user friendly.

In my opinion, SW has really stepped up recently and begun to add some much needed functionality, but it has also come at a price. As long as new features can be added as modules, the base product remains more accessible to a larger customer base, but the more features that are integrated into the base product, the more expensive it becomes. I'm sure this is something that's very much on the minds of the folks at SW. A truly functional event correlator, along with business impact determination and rule enforcement, is like the Holy Grail in the monitoring world, and would very likely come at a price. Something more basic might be more easily derived, but we will have to wait and see.

As for the question of suppressing alerts for interface events in the given scenario, if your interface alert is configured to trigger when an interface goes "Down", then it should not trigger if the state of the interface is "Unknown". If you're alerting on all interface "Down" events for all interfaces along this entire path (without interface alert suppression) and suppressing alerts for devices behind RouterB, then you should only see two alerts; one stating that Router B is "Down", and one stating that the interface on RouterA that goes to RouterB is also "Down". To take it a step further, you could get even more granular by suppressing node "Down" alerts via suppression configuration on an interface, but that adds an even greater level of complexity to the mix.

I think the trick to getting the most useful alerts from any alerting engine where suppression is possible, is to rationalize what makes the most sense for your given environment. To reduce the administration burden, target suppression only in the areas that make the most sense. Aggregation points such as core, distribution, and server farm switches and routers are likely targets. It all depends on the architecture of your particular environment and how much time and effort you're willing to spend to de-duplicate your alerts.

Vic Cato
Technical Lead - Network Operations
Koch Business Solutions - Enterprise Technical Services

  • | Post Points: 1

99 Posts
Points 580
WINNT replied on 07-16-2007 10:35 PM
rated by 0 users

Three years ago SolarWinds support gave me a special suppression dll that allows for suppressing a node based on the status of a node specified in a custom property.  It was extremely helpful in our environment since we were monitoring 1000 routers and the devices behind them.  I was going to explain how to configure the suppression, but noticed that it has been wiped out in 8.1.  I opened a ticket with support and hopefully it will be simple to re-register the dll. 


415 Posts
Points 2,023
vhcato replied on 07-17-2007 6:42 AM
rated by 0 users

WINNT:

Three years ago SolarWinds support gave me a special suppression dll that allows for suppressing a node based on the status of a node specified in a custom property.

OK... If you're saying what I think you're saying, I've been trying to figure out how to do this very thing. Are you saying that this mechanism allows you to create one alert (covering all nodes) that triggers (or suppresses) based on the status of the node listed in custom property x? This would mean you only have to create one alert for each level of node dependency (not each location), and simply populate the custom property field with the appropriate parent node.

This is something that really needs to be included in the base product, and should be quite simple to implement. Ideally, it would also be able to suppress based on the status of more than one device (router x AND router y), thus encompassing situations where redundancy is present.

Any thoughts from the folks at SW?

Vic Cato
Technical Lead - Network Operations
Koch Business Solutions - Enterprise Technical Services

  • | Post Points: 5

35 Posts
Points 90
JMP replied on 07-17-2007 8:02 AM
rated by 0 users

Yes, please post the results of that ticket.  It has been cumbersome at best for us to configure alert suppressions.  This would be a huge help.

  • | Post Points: 1

99 Posts
Points 580
WINNT replied on 07-17-2007 9:48 AM
rated by 0 users

Yes, you could add multiple custom properties and then create a suppression for each one.  Here is an old screen shot:

So since the alert suppression is first in the list, non of the alerts following will be executed.  If my e-mail alert was first, then it would be sent out.

When the router listed in the custom property "StoreRouter" is not up, then alerts will be suppressed.  Hopefully SolarWinds will be able to add this feature to v8.


2 Posts
Points 6
Astaroba replied on 09-22-2007 12:06 AM
rated by 0 users

Hi. I need you help. In Orion 8.1 As I form "Advanced Alarm"...??

Suppress when:

NodeName equal to SwitchA OR
NodeName equal to RouterB OR
NodeName equal to RouterA
AND
Node Status not equal to Up 

Thanks!!
  • | Post Points: 3

26 Posts
Points 64
arielik replied on 09-26-2007 9:00 AM
rated by 0 users

Hi, i dont have the option of supress an alert with and action.

 

  • | Post Points: 1

280 Posts
Points 728
SamuelB replied on 09-26-2007 9:58 AM
rated by 0 users

 The ability to suppress alerts based upon the property of another node(s) would be very helpful to me. Why was this thought of and developed in the past but now is forgotten?

  • | Post Points: 3

2,686 Posts
Points 7,704
Moderator
SolarWinds Employee
denny.lecompte replied on 09-26-2007 10:26 AM
rated by 0 users

It's on our roadmap. 

Denny LeCompte
Sr. Product Manager, Orion
SolarWinds
Austin, TX

  • | Post Points: 5

280 Posts
Points 728
SamuelB replied on 09-26-2007 11:16 AM
rated by 0 users

Thanks Denny, great to hear!