Search 85,961 posts and 653 resources contributed by 43,633 members or post a topic.

Already Joined? Sign in
Alerting Concerns in Orion NPM 9.5

Page 2 of 3 (33 items) < Previous 1 2 3 Next > | RSS

rated by 0 users
Answered (Verified) This post has 1 verified answer | 32 Replies | 11 Followers | 2,540 Views


12 Posts
Points 48
dudds posted on Sun, Jun 14 2009 7:32 PM
rated by 0 users

Hi All,

We've recently (about 1 month ago) purchased Orion NPM and have been very happy with it to date. I have setup the node UP/DOWN alerts for all our monitored nodes. When a node goes up / down our NOC will receive an email alert. This has been implemented for the last week and has been running in parallel with Statseeker. We bought Orion NPM to replace statseeker.

However, overnight I had received a number of emails from Statseeker saying that certain nodes had gone down and some had come back up, but I had no emails from Orion NPM. This is a concern. Had it not been for Statseeker our comms team would have had no idea there were issues in some of our overseas offices.

I logged into our web console and sure enough NPM was showing 4 nodes that were down, yet I had received no alert notification from NPM. I thought perhaps it was an email with the mail server (NPM and statseeker use the same one). I then tried a test fire of an alert, but did not receive an email and I also noticed that I got no output in the "Test Fire Alert" screen. I then ran up the Orion Service Manager and checked the status of all processes. Everything was running. I then restarted the SolarWinds Alerting Engine and as soon as it restarted I received a flood of emails containing the up/down alerts that I missed and I also noticed the normal output in the test fire window.

For those of you who have been using NPM for a while have you experienced these problems before? Does it happen frequently? I am worried that there is some sort of bug with the Alerting service.

Thanks

Answered (Verified) Verified Answer


2,601 Posts
Points 11,172
Moderator
SolarWinds Employee
Answered (Verified) bshopp replied on Fri, Jun 19 2009 8:39 AM
rated by 0 users

Please apply SP2 to fix this issue, see here

Brandon Shopp - Product Manager Orion NPM, Network Atlas and LANSurveyor

  • | Post Points: 21

All Replies


253 Posts
Points 626
SolarWinds Employee
dperdue replied on Wed, Jun 17 2009 11:22 AM
rated by 0 users

We have identified the problem and are working on a fix. 

The problem happens when a connection to the database is lost temporarily.   The work-around for now is to restart the alert service after database connectivity is restored.

David Perdue
Director of Product Development
SolarWinds
Austin, TX

  • | Post Points: 5

321 Posts
Points 3,574
Thwack MVP
Answered (Not Verified) r0berth1 replied on Thu, Jun 18 2009 8:43 AM
rated by 0 users
Suggested by Elisabeth Zakes

I have a few alerting problems with 9.5 but they went away once i installed SP1. The difference may be that stat seeker is alerting instantly when the circuit goes down. NPM waits 1 min before it alerts by default. I have mine set to alert after the circuit has been in a down state for 3 min. It has been accurate so far. I was calling my locations as soon as i get the alerts and verified the times and they were correct. I also have the locations calling me as soon as the locations go down so i know I am getting all of the alerts. I have one more week to of operating like this before we go back to where the locations dont call anymore just to be sure orion is working properly since the upgrade. but so far so good.

Hancock Bank

Network Engineer

  • Post Points: 1

263 Posts
Points 747
freemen replied on Thu, Jun 18 2009 11:03 AM
rated by 0 users

dperdue,

This is perhaps a little off topic, but can we not test the SQL database connection using APM? If so, which monitor or application specifically would be used? Thanks.

  • | Post Points: 3

225 Posts
Points 1,121
SolarWinds Employee
Sham Chauthani replied on Thu, Jun 18 2009 11:15 AM
rated by 0 users

The SQL User Experience Monitor is the best suited for this task.

See http://www.solarwinds.com/NetPerfMon/SolarWinds/wwhelp/wwhimpl/js/html/wwhelp.htm#href=OrionAPMPHComponentTypesSqlQA.htm

You will want to use the monitor to connect to Orion DB and run a query (perhaps on the nodes table) to ensure connectivity to the DB.

Sham Chauthani
Development Manager
SolarWinds

  • | Post Points: 1

2,601 Posts
Points 11,172
Moderator
SolarWinds Employee
Answered (Verified) bshopp replied on Fri, Jun 19 2009 8:39 AM
rated by 0 users

Please apply SP2 to fix this issue, see here

Brandon Shopp - Product Manager Orion NPM, Network Atlas and LANSurveyor

  • | Post Points: 21

336 Posts
Points 1,190
SolarWinds Certified Professional
borgan replied on Fri, Jun 19 2009 9:12 AM
rated by 0 users

Initial testing of the hotfix looks good. I stopped and then restarted the SQL instance on my server and then tested a simple node down alert. All functionality seems fine. Will test a few more times today.

SolarWinds Certified Professional

Technology Consultant/Trainer

Corona Technical Services

918-398-8052

  • | Post Points: 1

23 Posts
Points 81
breener96 replied on Fri, Jun 19 2009 9:54 AM
rated by 0 users

I agree that I cannot trust Solarwinds 9.5 right now... everyday my alerting just stops working and I have to restart the alerting service.  This is a HUGE problem and is unacceptable for an enterprise application.  I had also had other issues, which were fixed, but not without a lot of time dedicated to fixing it.

I also do not want to revert back to 9.1 SP5, but I might have to soon if it is not fixed. 

I am terribly disappointed with this release as I have had multiple issues since its upgrade.  I care more about the reliability of the application than I do with the addition of new features.  I know that I will never upgrade NPM during its early stages again. 

Some advice to Solarwinds - do not rush your releases and be sure that it is reliable.  Perform better testing... reliability is sooooo much more important than new features!  If I wasnt dependednt on this application right now, then I would be reviewing alternative products.

  • | Post Points: 3

253 Posts
Points 626
SolarWinds Employee
dperdue replied on Fri, Jun 19 2009 10:02 AM
rated by 0 users

breener96,

Have you applied the hotfix referenced above?  This should solve the alerting issue.

David Perdue
Director of Product Development
SolarWinds
Austin, TX

  • | Post Points: 1

5 Posts
Points 19
cvinka replied on Fri, Jun 19 2009 11:01 AM
rated by 0 users

I loaded the hot fix as part of my case ticket update yesterday afternoon.  I think I prefer doing the scheduled stop/start of the service.  The fix seems to be keeping the service from stopping now.

One scare I had last night: I logged in to the web console to verify everything was ok and saw in the even summary a total for devices removed equaled that of the devices we have in the system!  I'm not sure why or how this could be reported, just happy to say everything was still in the system and being polled. 

I have noticed with the fix the the Business layer seems to be debugging more and I still see Syslog and Trap DB errors in the event log.  I really hope SP2 comes out Monday and resolves all these issues.  Management is really unhappy, and I'm spending way too much time on this.

 

 

  • | Post Points: 3

320 Posts
Points 1,140
SolarWinds Employee
Elisabeth Zakes replied on Fri, Jun 19 2009 2:27 PM
rated by 0 users

cvinka, I'm glad the hotfix is working for you on the service staying up!

I believe your support technician is working with you on the syslog/trap issue since it's not related to the Alert hotfix. And SP2 is still due to be out very shortly! Watch for that announcement on the forum soon.

Thanks to everyone who's been posting about the alerts and giving input so we could get this resolved quickly for you!

Elisabeth Zakes

  • | Post Points: 1

38 Posts
Points 923
thurgoodj187 replied on Fri, Jun 19 2009 4:48 PM
rated by 0 users

I had this problem in 8.5, but do not in 9.1.  It thows a bunch of exception errors in the solarwinds.net evt log file.  restarting the alerting service corrected the problem for a couple weeks.

  • | Post Points: 1

321 Posts
Points 3,574
Thwack MVP
r0berth1 replied on Tue, Jun 23 2009 1:28 PM
rated by 0 users

I know that I said above that i quit having problems once I installed SP1, but I have a correction or two to add to that statement. The alerting problems that I was having was that the alerts were not running before SP1. Now, they are working fine, but had my first major outtage last night and found that about 50% of the alerts were accurate, the rest either sent out alert when the circuit never went down, or the circuit went down but I didnt get an alert. So now i am unsure if I can trust Orion as much as I did per 9.5. Unfortunatly for me, I dont have any other software to rely on for the alerts. So SW needs to step up their testing efforts with SP2 or SP3.

Hancock Bank

Network Engineer

  • | Post Points: 3

2,601 Posts
Points 11,172
Moderator
SolarWinds Employee
bshopp replied on Tue, Jun 23 2009 1:49 PM
rated by 0 users

r0berth1, did you apply the hotfix?

Brandon Shopp - Product Manager Orion NPM, Network Atlas and LANSurveyor

  • | Post Points: 3

321 Posts
Points 3,574
Thwack MVP
r0berth1 replied on Tue, Jun 23 2009 2:52 PM
rated by 0 users

yes i did. I applied it the day it came out.

Hancock Bank

Network Engineer

  • | Post Points: 3

2,601 Posts
Points 11,172
Moderator
SolarWinds Employee
bshopp replied on Tue, Jun 23 2009 3:02 PM
rated by 0 users

OK, just making sure.  Please open a support case so we can investigate this further

Brandon Shopp - Product Manager Orion NPM, Network Atlas and LANSurveyor

  • | Post Points: 1
Page 2 of 3 (33 items) < Previous 1 2 3 Next > | RSS

© 2003 - 2010 SolarWinds, Inc. All Rights Reserved.

Who is SolarWinds?

SolarWinds is rewriting the rules for how companies manage their networks. Guided by a global community of network engineers, SolarWinds develops simple and powerful network management software and network monitoring software for networks of all sizes. SolarWinds also offers a network certification program to become a SolarWinds Certified Professional (SCP).

What is thwack?

thwack, SolarWinds online community site, was designed by network engineers, for network engineers. thwack is a vibrant, growing community of more than 30,000 IT pros who share a passion for technology.

Explore Resources, Answers, Templates, and Advice

Download Free Networking Tools


Learn More About SolarWinds Products