lsapi processes not being used, build up to > max_connections

#1
Even more alarmingly, shows all 10 of my lsapi processes as 'in use' (on a fairly busy server), but when I strace them, they're all in select(11...)

Scope Type Name Max CONN Eff Max Pool In Use Idle WaitQ Req/Sec
ap LSAPI Rails:ap:/ 10 10 10 10 0 8 14

Scope Type Name Max CONN Eff Max Pool In Use Idle WaitQ Req/Sec
ap LSAPI Rails:ap:/ 10 10 10 10 0 27 0

It only looks like one of the lsapi processes is doing anything from what I can tell with strace.

Additionally, maxconns is 10, but there are 22 running:
[root@rhyme data]# ps auxww | grep RailsRunner | wc -l
22

Any ideas? It's generally very fast, but can build up slow during the busy times. I've recently upgraded to litespeed from lighttpd and am overall very happy, but worried about this.

Mysql shows no slow queries, server load is only at .4.

Thank you,
Kevin
 
#2
I should add, that one of them seems stuck in 'nanosleep' versus the expected select():

nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
 
#3
Two different other kinds of sleep:

Fresh processes (looks still right)
select(1, [0], NULL, NULL, {0, 433000}) = 0 (Timeout)
kill(27359, SIG_0) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
kill(27359, SIG_0) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)



Another one, not sure what htis is, but perhaps it's just something inside my application. More gettimeofday() going on.

gettimeofday({1164067094, 613573}, NULL) = 0
gettimeofday({1164067094, 613617}, NULL) = 0
select(8, [3 7], [], [], {0, 999956}) = 0 (Timeout)
gettimeofday({1164067095, 612726}, NULL) = 0
select(8, [3 7], [], [], {0, 847}) = 0 (Timeout)
gettimeofday({1164067095, 613680}, NULL) = 0
select(8, [3 7], [], [], {0, 0}) = 0 (Timeout)
kill(27360, SIG_0) = 0
gettimeofday({1164067095, 613777}, NULL) = 0
gettimeofday({1164067095, 613805}, NULL) = 0
select(8, [3 7], [], [], {0, 999971} <unfinished ...>

When I did lswctrl restart before, it wouldn't kill off the lsapi processes stuck in select() w/o the kill every few msec. They must be kind-of crashed, but the 'max workers' checker isn't picking up on it, and I need to kill -5 them to get them to ever go away.

It doesn't seem to happen for awhile after starting the server. Restarting the server fairly early in it's life results in all the processes being killed & re-created as expected.
 
#5
nope

They apparently all crashed this morning resulting in some short downtime.

My server error log has things like this in it, though I think they're normal:

2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Connection idle time: 16 while in state: 5 watching for event: 25,close!
2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Content len: 0, Request line:
GET /poem/add HTTP/1.1
2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Redirect: #1, URL: /dispatch.cgi
2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] HttpExtConnector state: 8, request body sent: 0, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Connection idle time: 16 while in state: 5 watching for event: 25,close!
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Content len: 1555097, Request line:
POST /user/face HTTP/1.1
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Redirect: #1, URL: /dispatch.cgi
2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] HttpExtConnector state: 10, request body sent: 131072, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.

Also, the 'graceful restart' throws the iowait load on my server crazy and load up to 30 - not sure if perhaps I have a bad scsi cable or something, but it may have to do with the rails processes suck in the 'bad' select(). If I killall -5 ruby & lshttpd, then start the server, it's quick and painless... Likewise, graceful restart fairly quickily after a new start (i.e. no 'bad' selects() yet) seems to work fine.

Note: I'm running this as a user besides 'nobody' - not sure if that could have an impact at all.

Please let me know if there's any debug output or log details I can give you that would help!
 

mistwang

LiteSpeed Staff
#6
You need to find out what exactly causes he bad select(), I think it is in ruby or your rails app, not in LSAPI code.

You can try "lsof" or start "strace" at beginning of a request.

Ruby always resume a function call if it is interrupted (EINTR) by a signal, so sometime it becomes very difficult to kill a ruby process in the normal way.
 

fantasydreaming

Well-Known Member
#9
Glad I could help :)

One other place that I had problems with it hanging was wherever I was attempting to resolve IP addresses to DNS names... maybe it was just being glacial, but I'd sometimes get an execution timeout expired error as well.
 
Top