lsapi processes not being used, build up to > max_connections

fantasydreaming · Nov 20, 2006

Even more alarmingly, shows all 10 of my lsapi processes as 'in use' (on a fairly busy server), but when I strace them, they're all in select(11...)

Scope Type Name Max CONN Eff Max Pool In Use Idle WaitQ Req/Sec
ap LSAPI Rails:ap:/ 10 10 10 10 0 8 14

Scope Type Name Max CONN Eff Max Pool In Use Idle WaitQ Req/Sec
ap LSAPI Rails:ap:/ 10 10 10 10 0 27 0

It only looks like one of the lsapi processes is doing anything from what I can tell with strace.

Additionally, maxconns is 10, but there are 22 running:
[root@rhyme data]# ps auxww | grep RailsRunner | wc -l
22

Any ideas? It's generally very fast, but can build up slow during the busy times. I've recently upgraded to litespeed from lighttpd and am overall very happy, but worried about this.

Mysql shows no slow queries, server load is only at .4.

Thank you,
Kevin

fantasydreaming · Nov 20, 2006

I should add, that one of them seems stuck in 'nanosleep' versus the expected select():

nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0

fantasydreaming · Nov 21, 2006

Two different other kinds of sleep:

Fresh processes (looks still right)
select(1, [0], NULL, NULL, {0, 433000}) = 0 (Timeout)
kill(27359, SIG_0) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
kill(27359, SIG_0) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)

Another one, not sure what htis is, but perhaps it's just something inside my application. More gettimeofday() going on.

gettimeofday({1164067094, 613573}, NULL) = 0
gettimeofday({1164067094, 613617}, NULL) = 0
select(8, [3 7], [], [], {0, 999956}) = 0 (Timeout)
gettimeofday({1164067095, 612726}, NULL) = 0
select(8, [3 7], [], [], {0, 847}) = 0 (Timeout)
gettimeofday({1164067095, 613680}, NULL) = 0
select(8, [3 7], [], [], {0, 0}) = 0 (Timeout)
kill(27360, SIG_0) = 0
gettimeofday({1164067095, 613777}, NULL) = 0
gettimeofday({1164067095, 613805}, NULL) = 0
select(8, [3 7], [], [], {0, 999971} <unfinished ...>

When I did lswctrl restart before, it wouldn't kill off the lsapi processes stuck in select() w/o the kill every few msec. They must be kind-of crashed, but the 'max workers' checker isn't picking up on it, and I need to kill -5 them to get them to ever go away.

It doesn't seem to happen for awhile after starting the server. Restarting the server fairly early in it's life results in all the processes being killed & re-created as expected.

mistwang · Nov 23, 2006

Is the server doing OK now?

fantasydreaming · Nov 23, 2006

nope

They apparently all crashed this morning resulting in some short downtime.

My server error log has things like this in it, though I think they're normal:

2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Connection idle time: 16 while in state: 5 watching for event: 25,close!
2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Content len: 0, Request line:
GET /poem/add HTTP/1.1
2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Redirect: #1, URL: /dispatch.cgi
2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] HttpExtConnector state: 8, request body sent: 0, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Connection idle time: 16 while in state: 5 watching for event: 25,close!
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Content len: 1555097, Request line:
POST /user/face HTTP/1.1
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Redirect: #1, URL: /dispatch.cgi
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] HttpExtConnector state: 10, request body sent: 131072, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.

Also, the 'graceful restart' throws the iowait load on my server crazy and load up to 30 - not sure if perhaps I have a bad scsi cable or something, but it may have to do with the rails processes suck in the 'bad' select(). If I killall -5 ruby & lshttpd, then start the server, it's quick and painless... Likewise, graceful restart fairly quickily after a new start (i.e. no 'bad' selects() yet) seems to work fine.

Note: I'm running this as a user besides 'nobody' - not sure if that could have an impact at all.

Please let me know if there's any debug output or log details I can give you that would help!

mistwang · Nov 23, 2006

You need to find out what exactly causes he bad select(), I think it is in ruby or your rails app, not in LSAPI code.

You can try "lsof" or start "strace" at beginning of a request.

Ruby always resume a function call if it is interrupted (EINTR) by a signal, so sometime it becomes very difficult to kill a ruby process in the normal way.

fantasydreaming · Dec 1, 2006

This was caused by a search using the ruby-google module... apparently http-access and http-access2 can both get stuck waiting 'forever' for a google (or anywhere, likely) response.

The solution for me was to wrap it in a Timeout::timeout(5) do() block.

I was able to debug it using the gdb, really sweetly following the instructions here: http://eigenclass.org/hiki.rb?ruby+live+process+introspection

mistwang · Dec 1, 2006

Cool! I will add that to our Wiki. Thanks!

fantasydreaming · Dec 1, 2006

Glad I could help

One other place that I had problems with it hanging was wherever I was attempting to resolve IP addresses to DNS names... maybe it was just being glacial, but I'd sometimes get an execution timeout expired error as well.

lsapi processes not being used, build up to > max_connections

fantasydreaming

Well-Known Member

fantasydreaming

Well-Known Member

fantasydreaming

Well-Known Member

mistwang

LiteSpeed Staff

fantasydreaming

Well-Known Member

mistwang

LiteSpeed Staff

fantasydreaming

Well-Known Member

mistwang

LiteSpeed Staff

fantasydreaming

Well-Known Member