I have a strange problem that I hope you can help solve.
I have 18 web servers running Litespeed 3.1.1 standard 32bit. All web servers are started and stopped at the same time via a capistrano recipe.
I have my rails application email me any uncaught exceptions. About every 20 to 30 minutes I get a mass of exceptions reporting "Mysql::Error: Lost connection to MySQL server during query", or something very similar.
On the mysql servers I get this:
Aborted connection 11221 to db: 'xxxxx' user: 'xxxxx' host: 'xxx.xxxxxxx.internal' (Got timeout reading communication packets)
I was getting these messages constantly before I upped the following values to this:
LSAPI_MAX_REQS=5000
LSAPI_MAX_IDLE=180
I now just get them every 20 to 30 minutes or so.
The really strange thing is that all the disconnects happen at pretty much the same time on all of the servers, and I think I am getting 1 error for every fcgi process, though I am not sure how to confirm this.
I am guessing that litespeed is reaping the fcgi processes at some interval (which is why it happens on all the servers at the same time. Remember that all servers are restarted at the same time by a script when we push code), and they are not closing their mysql connections properly.
So, what happens is we get lots of reports from users and customers that they see our default error page.
I have done a lot of testing, first assuming this was a database issue. Finally yesterday I did some long running tests on one server, targeting one database, and then stopping litespeed, firing up 4 mongrels and repeating the tests (about 2 hours hitting each mongrel once per second). During this test I received no errors. So, at this point I am presuming it must have something to do with litespeed and the fcgi processes.
Any information or help is much appreciated.
Tim
I have 18 web servers running Litespeed 3.1.1 standard 32bit. All web servers are started and stopped at the same time via a capistrano recipe.
I have my rails application email me any uncaught exceptions. About every 20 to 30 minutes I get a mass of exceptions reporting "Mysql::Error: Lost connection to MySQL server during query", or something very similar.
On the mysql servers I get this:
Aborted connection 11221 to db: 'xxxxx' user: 'xxxxx' host: 'xxx.xxxxxxx.internal' (Got timeout reading communication packets)
I was getting these messages constantly before I upped the following values to this:
LSAPI_MAX_REQS=5000
LSAPI_MAX_IDLE=180
I now just get them every 20 to 30 minutes or so.
The really strange thing is that all the disconnects happen at pretty much the same time on all of the servers, and I think I am getting 1 error for every fcgi process, though I am not sure how to confirm this.
I am guessing that litespeed is reaping the fcgi processes at some interval (which is why it happens on all the servers at the same time. Remember that all servers are restarted at the same time by a script when we push code), and they are not closing their mysql connections properly.
So, what happens is we get lots of reports from users and customers that they see our default error page.
I have done a lot of testing, first assuming this was a database issue. Finally yesterday I did some long running tests on one server, targeting one database, and then stopping litespeed, firing up 4 mongrels and repeating the tests (about 2 hours hitting each mongrel once per second). During this test I received no errors. So, at this point I am presuming it must have something to do with litespeed and the fcgi processes.
Any information or help is much appreciated.
Tim