Myreader.co.uk  
uk news, chat and community
   home   |   control panel login   |   archive   |  
 
net
net
news.announce
news.config
news.management
news.moderation
providers
providers.aaisp
web.authoring
  
 
date: Sat, 31 Oct 2009 22:39:04 +0000,    group: uk.net.providers.aaisp        back       
[Status] [Update #5] [open] Major issue (not BT)   
Posted at 2009-10-31 22:12 GMT by RevK
Update #5: 2009-10-31 22:39 GMT

  Something is up. Not a crash but we are investigating now. Looks like
  may be an issue with our RADIUS.
  
  Update: Looks like some sort of issue started a little before 9pm and
  has resulted in some lines going off line after 10pm.
  
  Update: We can confirm the LNS did not crash. Lines are coming back
  now, and we are trying to find the underlying cause of the issue.
  Looking at graphs it seems most lines that have been affected were only
  off for a couple of minutes.
  
  Update: This looks more complex. We have seen lots of LCP timeouts as
  well.
  
  OK, the issue looks like one of RADIUS accounting stopping mostly
  around 9pm and it is not clear why. We have restarted radius servers,
  and it seems to be working at the moment. We are checking logs
| carefully to find how this happened.
> 
> Update: We are clearly sessions stuck on 20CN RASs still, but looks
> like we have most people back on line.[IMAGE]

URL: http://aaisp.blogspot.com/2009/10/open-major-issue.html

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sat, 31 Oct 2009 22:39:04 +0000   author:   RevK

[Status] [Update #6] [closed] Major issue (not BT)   
Posted at 2009-10-31 22:12 GMT by RevK
Update #6: 2009-10-31 22:46 GMT

  Something is up. Not a crash but we are investigating now. Looks like
  may be an issue with our RADIUS.
  
  Update: Looks like some sort of issue started a little before 9pm and
  has resulted in some lines going off line after 10pm.
  
  Update: We can confirm the LNS did not crash. Lines are coming back
  now, and we are trying to find the underlying cause of the issue.
  Looking at graphs it seems most lines that have been affected were only
  off for a couple of minutes.
  
  Update: This looks more complex. We have seen lots of LCP timeouts as
  well.
  
  OK, the issue looks like one of RADIUS accounting stopping mostly
  around 9pm and it is not clear why. We have restarted radius servers,
  and it seems to be working at the moment. We are checking logs
  carefully to find how this happened.
  
  Update: We are clearly sessions stuck on 20CN RASs still, but looks
| like we have most people back on line.
> 
> Update: It looks like 21CN lines had a brief outage but 20CN much
> longer as we have had to go through clearing stuck sessions in BT.
> Something that was meant to have been fixed some time ago. We'll chase
> this with BT.[IMAGE]

URL: http://aaisp.blogspot.com/2009/10/open-major-issue.html

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sat, 31 Oct 2009 22:46:00 +0000   author:   RevK

[Status] [Update #7] [closed] Major issue (not BT)   
Posted at 2009-10-31 22:12 GMT by RevK
Update #7: 2009-11-01 16:54 GMT

  Something is up. Not a crash but we are investigating now. Looks like
  may be an issue with our RADIUS.
  
  Update: Looks like some sort of issue started a little before 9pm and
  has resulted in some lines going off line after 10pm.
  
  Update: We can confirm the LNS did not crash. Lines are coming back
  now, and we are trying to find the underlying cause of the issue.
  Looking at graphs it seems most lines that have been affected were only
  off for a couple of minutes.
  
  Update: This looks more complex. We have seen lots of LCP timeouts as
  well.
  
  OK, the issue looks like one of RADIUS accounting stopping mostly
  around 9pm and it is not clear why. We have restarted radius servers,
  and it seems to be working at the moment. We are checking logs
  carefully to find how this happened.
  
  Update: We are clearly sessions stuck on 20CN RASs still, but looks
  like we have most people back on line.
  
  Update: It looks like 21CN lines had a brief outage but 20CN much
  longer as we have had to go through clearing stuck sessions in BT.
  Something that was meant to have been fixed some time ago. We'll chase
| this with BT.
> 
> Update: Sunday. It has taken a bit of investigation... We think we have
> identified the cause. It was to do with work for the new test LNS on
> Saturday night and very minor changes but were not right and caused
> some authentication requests not to reply. On their own they would not
> have had this effect, but the LNS is a tad inefficient when it does not
> get a RADIUS response (an issue we have been working on anyway) and
> this combined with a problem in RADIUS accounting all meant that
> several hours later things went wrong. The action taken at the time
> problems started was to revert all changes in source control
> immediately as a precaution and this solved the problem. BT having
> stuck sessions caused further knock on effects as usual. Only now are
> the pieces fitting together to explain how such a minor change had a
> knock on effect. The accounting issue has been sorted. The
> authentication issue has been sorted. The LNS inefficiency is in
> development for next LNS release. So should all be fine now. We do
> apologise for the inconvenience, and yes, this time it was not entirely
> BT to blame![IMAGE]

URL: http://aaisp.blogspot.com/2009/10/open-major-issue.html

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sun, 01 Nov 2009 16:54:47 +0000   author:   RevK

Google
 
Web myreader.co.uk


    COPYRIGHT 2007, YARDI TECHNOLOGY LIMITED, ALL RIGHT RESERVE  |   contact us