Myreader.co.uk  
uk news, chat and community
   home   |   control panel login   |   archive   |  
 
net
net
news.announce
news.config
news.management
news.moderation
providers
providers.aaisp
web.authoring
  
 
date: Fri, 03 Jul 2009 18:07:09 +0100,    group: uk.net.providers.aaisp        back       
[Status] [info] bug hunting   
Posted at 2009-07-03 17:52 BST by AAISP

We are still trying to sort a network issue affecting 21CN services. We
have clear evidence of some sort of network storm on the LAN in
Telehouse. We have been unable to identify the source. We suspect a
switch issue, we are planning to replace switches with some shiny new
Allied Telesis models. However, our equipment should not fail in light
of this occasional nuisance. We have worked through a number of
techniques for addressing the problem which is fundamentally the same
as handling a denial of service attack. We keep pushing the problem to
the next bit of code. In the past we have managed attacks well and we
should be able to sort this.

However, we have the development team (including me) working over the
weekend, and with the testers on the test LNS, to get to the bottom of
this. We are putting in even more diagnostics and even more rate
controls at different levels in the system to ensure we can handle the
problem without a restart. We are determined to get to the bottom of
the issue.

Having issued a "stability" guarantee from 1st July, we have "paid out"
by applying two extra 1GB peak usage top-ups to affected 21CN logins. I
am sure this will not seem much to some of you, but this is very costly
for us as a surprising number of people pay top-up usage each month so
this can equate to many thousands of pounds. We don't really need this
incentive - the quality and stability of our service is paramount and
we have a reputation to regain for reliability.

This is made even more important by the fact we hope to move all lines
to the new 21CN links to us some time next month. A big task, and one
that is over a year after we expected. It is essential we have 100%
reliable LNSs for our service now and in the future. Do not worry - we
will delay this if we do not have stable LNSs.

We do apologise, most sincerely, for the issues. We know that for many
this is a matter of seconds of outage every few days. We know a few of
you have had minutes of outage (an issue we are taking up with BT as it
should recover instantly, our end does!). We know that even a few
seconds is not acceptable.

Testers on the test LNS (simple prefix your login with test- on 21CN)
are invaluable in confirming the beta releases of new LNS code are
stable and working. You get all usage counting 50% for billing
purposes. Feel free to join in if you wish. You can change login back
at any time.

I'll post follow-ups to this with details of the progress we are
making.

Please do bear with us. If you are fed up, please do call or irc or
email the specially set up mailing address we posted. If you leave,
please do look again in a few weeks. You may find that you should have
stayed, and we'll welcome you back.

And thank you all of our customers. We appreciate your support in
difficult times.[IMAGE]

URL: http://aaisp.blogspot.com/2009/07/info-bug-hunting.html

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Fri, 03 Jul 2009 18:07:09 +0100   author:   AAISP

Re: [Status] [info] bug hunting   
Posted at 2009-07-04 12:25 BST by AAISP

Just to update you - there are two of us working on this at present. We
are pretty sure we have it sussed, but we have said that before! We
have a lot of checking to do. We'll almost certainly load new code
again tonight, so a ppp restart between 2am to 3am.

URL: http://aaisp.blogspot.com/2009/07/info-bug-hunting.html?showComment=1246706725552#c5658789013258511932

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sat, 04 Jul 2009 12:25:25 +0100   author:   AAISP

Re: [Status] [info] bug hunting   
Posted at 2009-07-04 19:07 BST by AAISP

We still hope to release new code for initial testing later tonight.
There is a lot of checking being done and a lot of code inspection and
testing being done first. We are really keen to get to the bottom of
this. We have identified two likely causes for the failure and
addressed both, but are still working on this to be sure. Two
developers are still working on this even now.

URL: http://aaisp.blogspot.com/2009/07/info-bug-hunting.html?showComment=1246730850054#c6004253838542575110

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sat, 04 Jul 2009 19:07:30 +0100   author:   AAISP

Re: [Status] [info] bug hunting   
Posted at 2009-07-05 02:13 BST by AAISP

We have identified a further small number of problems which are likely
to have contributed to the recent instability. We want to perform more
testing before switching users to the latest software, so we shall not
now be updating the main 21CN LNS overnight Sat/Sun. Testing and
deployment of new software will continue tomorrow (Sunday).

URL: http://aaisp.blogspot.com/2009/07/info-bug-hunting.html?showComment=1246756433606#c7199374741532337098

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sun, 05 Jul 2009 02:13:53 +0100   author:   AAISP

Re: [Status] [info] bug hunting   
Posted at 2009-07-05 10:01 BST by AAISP

Further work over night means we now have a new version of code. There
is some more checking going on today to be sure we have not missed
anything. In the mean time this is being loaded on the test LNS now and
will be put in live use later today with lines moving over to it over
night.

It is looking like the fix we did to handle high packet load was OK all
along and that the problem was a slightly different one which we
happened to pick up separately on Friday anyway. The symptoms were the
same. This explains why it continued.

We have also identified a couple of very very unlikely race conditions
which we have not seen, but have now been addressed.

Whilst we expect the new code to be stable, we are being cautious. The
guarantee still stands.

I'll update later today with progress.

Thank you all for your patience.

URL: http://aaisp.blogspot.com/2009/07/info-bug-hunting.html?showComment=1246784511954#c8838144319405825782

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sun, 05 Jul 2009 10:01:51 +0100   author:   AAISP

Re: [Status] [info] bug hunting   
Posted at 2009-07-05 15:22 BST by AAISP

We are now running new code on b.gormless, and lines that reconnect
will go on to this LNS. We will switch lines over around 2am.

URL: http://aaisp.blogspot.com/2009/07/info-bug-hunting.html?showComment=1246803727776#c4439553005598142423

-- 
AAISP Status Blog
URL:http://aaisp.blogspot.com/
date: Sun, 05 Jul 2009 15:22:07 +0100   author:   AAISP

Google
 
Web myreader.co.uk


    COPYRIGHT 2007, YARDI TECHNOLOGY LIMITED, ALL RIGHT RESERVE  |   contact us