Live Head Geek Video Chat: Virtualization Trends & Best Practices. Register Now >>
Search 100,860 posts and 877 resources contributed by 61,822 members or post a topic.

Already Joined? Sign in
Alerting Concerns in Orion NPM 9.5

Page 1 of 3 (33 items) 1 2 3 Next > | RSS

rated by 0 users
Answered (Verified) This post has 1 verified answer | 32 Replies | 11 Followers | 3,648 Views


12 Posts
Points 50
dudds replied on Sun, Jun 14 2009 7:32 PM
rated by 0 users

Hi All,

We've recently (about 1 month ago) purchased Orion NPM and have been very happy with it to date. I have setup the node UP/DOWN alerts for all our monitored nodes. When a node goes up / down our NOC will receive an email alert. This has been implemented for the last week and has been running in parallel with Statseeker. We bought Orion NPM to replace statseeker.

However, overnight I had received a number of emails from Statseeker saying that certain nodes had gone down and some had come back up, but I had no emails from Orion NPM. This is a concern. Had it not been for Statseeker our comms team would have had no idea there were issues in some of our overseas offices.

I logged into our web console and sure enough NPM was showing 4 nodes that were down, yet I had received no alert notification from NPM. I thought perhaps it was an email with the mail server (NPM and statseeker use the same one). I then tried a test fire of an alert, but did not receive an email and I also noticed that I got no output in the "Test Fire Alert" screen. I then ran up the Orion Service Manager and checked the status of all processes. Everything was running. I then restarted the SolarWinds Alerting Engine and as soon as it restarted I received a flood of emails containing the up/down alerts that I missed and I also noticed the normal output in the test fire window.

For those of you who have been using NPM for a while have you experienced these problems before? Does it happen frequently? I am worried that there is some sort of bug with the Alerting service.

Thanks

Answered (Verified) Verified Answer


3,415 Posts
Points 15,700
Moderator
SolarWinds Certified Professional
SolarWinds Employee
Answered (Verified) bshopp replied on Fri, Jun 19 2009 8:39 AM
rated by 0 users

Please apply SP2 to fix this issue, see here

Brandon Shopp - Product Manager Orion NPM, Network Atlas and LANSurveyor
SolarWinds SCP Certified 

  • | Post Points: 21

All Replies


245 Posts
Points 873
SolarWinds Certified Professional
the_toilet replied on Mon, Jun 15 2009 1:15 AM
rated by 0 users

I have seen this problem on at least 2 of my sites.  I have around 10-12 orion installs, and 2 of them have shown this problem over the last 12 months.  the fact that it was so random meant that i did not log it out with solarwinds. 

i have considered restarting the alerting service each night but not sure that is satisfactory and not sure how the engine works, so that might generate an alert for all DOWN nodes when the servie restarts which would result in duplicate alerts over time....

anyone else?

  • | Post Points: 1

7 Posts
Points 21
cvinka replied on Mon, Jun 15 2009 4:03 PM
rated by 0 users

I currently have a case open for this: 100027.  I've supplied Diag files and event viewer info.  Seems to be a random halt in the Alerting engine or some failure accessing the Database stopping the alerts.  I haven't heard from support since 7am so I hope they are busy working on a resolution.

  • | Post Points: 3

320 Posts
Points 1,076
SolarWinds Employee
Elisabeth Zakes replied on Mon, Jun 15 2009 4:49 PM
rated by 0 users

Yes, they're working on it. Your technician is in the Cork (Ireland) office, so you'll likely hear from him in his morning.

Elisabeth Zakes

  • | Post Points: 1

23 Posts
Points 81
breener96 replied on Tue, Jun 16 2009 9:56 AM
rated by 0 users

I also had issues with alerting after upgrading to 9.5.  I had to reinstall the alerting service. 

  • | Post Points: 3

354 Posts
Points 1,322
SolarWinds Certified Professional
borgan replied on Tue, Jun 16 2009 10:33 AM
rated by 0 users

I ahd also been having difficulty with even basic alerts working correctly. I found that testing did not seem to function.

I looked at the alert log and discovered an intermittent database connection error. I performed a repair of the SW alerting service and that appeared to solve the problem. However, it seems to be back again after only a day.

Here is the current text from the log:

2009-06-16 10:01:02,750 [MainTaskThread] INFO  All - Alert Engine Starting. Running Version 9.5.0.0.
2009-06-16 10:18:40,140 [MainTaskThread] ERROR All - Error in SetupDBConnection System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at AlertingEngine.SWAlertingEngine.SetupDBConnection(SqlConnection& DBConnection, SqlCommand& DBCommand, SqlDataReader& DBReader)
2009-06-16 10:18:40,203 [MainTaskThread] ERROR All - Exception in MainTask Loop - Unable to create DB connection
2009-06-16 10:18:48,171 [MainTaskThread] INFO  All - Alert Engine Stopping

SolarWinds Certified Professional

Technology Consultant/Trainer

Corona Technical Services

918-398-8052

  • | Post Points: 1

354 Posts
Points 1,322
SolarWinds Certified Professional
borgan replied on Tue, Jun 16 2009 10:42 AM
rated by 0 users

Yeah, something is messed up indeed. I show the alerting service is running, but a simple node down alert will not fire until I stop and start the service. Then, the alert immediately shows up in System Manager.

SolarWinds Certified Professional

Technology Consultant/Trainer

Corona Technical Services

918-398-8052

  • | Post Points: 1

7 Posts
Points 21
cvinka replied on Tue, Jun 16 2009 10:48 AM
rated by 0 users

Since my Tech seem to only be working his timezone and answering requests while I am not in the office, this morning I did a full repair from the add/remove programs and re-installed the SP1.  This seems to have helped with the communication errors in the logs, and the alerting.

Maybe the SW mod can have my ticket re-assigned to someone in the states so that I can work with someone with the same work hours?

  • | Post Points: 3

337 Posts
Points 3,942
Thwack MVP
r0berth1 replied on Tue, Jun 16 2009 10:53 AM
rated by 0 users

I had the same problem untill i installed SP1 for 9.5.

Hancock Bank

Network Engineer

  • | Post Points: 3

3,415 Posts
Points 15,700
Moderator
SolarWinds Certified Professional
SolarWinds Employee
bshopp replied on Tue, Jun 16 2009 11:04 AM
rated by 0 users

Please apply SP1 and if still an issue, please open a support case

Brandon Shopp - Product Manager Orion NPM, Network Atlas and LANSurveyor
SolarWinds SCP Certified 

  • | Post Points: 3

320 Posts
Points 1,076
SolarWinds Employee
Elisabeth Zakes replied on Tue, Jun 16 2009 11:26 AM
rated by 0 users

cvinka:
Maybe the SW mod can have my ticket re-assigned to someone in the states so that I can work with someone with the same work hours?

cvinka, I'll send you an e-mail about this.

 

Elisabeth Zakes

  • | Post Points: 1

275 Posts
Points 791
freemen replied on Tue, Jun 16 2009 11:29 AM
rated by 0 users

Brandon,

Re-installing and applying SP1 seems to have ficed it again, but the question is will it remained fixed:)

I tested again using my advanced node down alert. The alert eval frequency  is set to 1 minute. I took down a node manually. I got two emails 2 minutes apart. Will I continue to get emails 1 or 2 minutes apart aas long as the node stays down? Id that the way the alert eval works?

  • | Post Points: 1

14 Posts
Points 187
mt1299 replied on Tue, Jun 16 2009 11:32 AM
rated by 0 users

I worked with technical support yesterday which included a "repair" of 9.5 and adding SP1.  We experienced the same issue beginning again at 3:45 this morning.  Kind of frustrating

  • | Post Points: 1

14 Posts
Points 187
mt1299 replied on Tue, Jun 16 2009 7:37 PM
rated by 0 users

Technical support indicated a code fix is in the works.

  • | Post Points: 1

7 Posts
Points 21
cvinka replied on Wed, Jun 17 2009 10:57 AM
rated by 0 users

Does anyone know if the issue has been identified and and ETA on the fix?  Right now the software can't be trusted to monitor and that a very bad thing...  I've been asked to back out this version if we don't see a fix in the near future, and I know that's not going to make me any happier.

  • | Post Points: 9
Page 1 of 3 (33 items) 1 2 3 Next > | RSS

© 2003 - 2010 SolarWinds, Inc. All Rights Reserved.

Who is SolarWinds?

SolarWinds is rewriting the rules for how companies manage their networks. Guided by a global community of network engineers, SolarWinds develops simple and powerful network management software and network monitoring software for networks of all sizes. SolarWinds also offers a network certification program to become a SolarWinds Certified Professional (SCP).

What is thwack?

thwack, SolarWinds online community site, was designed by network engineers, for network engineers. thwack is a vibrant, growing community of more than 30,000 IT pros who share a passion for technology.

Explore Resources, Answers, Templates, and Advice

Download Free Networking Tools


Learn More About SolarWinds Products