Hi there,
As you may know, I run a CentOS-based web cluster utilizing Litespeed on all of our backend nodes. A couple weeks ago I upgraded all of our Litespeed servers from 4.0.19 to 4.1.
First, the minor issue - 4.1 keeps informing me that a new version is available (4.0.20).
Next, the major issue - Around the same time as the upgrade all of our web servers began throwing back errors to our load balancers at random. I have spent the past two weeks tracking this issue, but I can't seem to identify the cause.
What happens is that a few hours or even a day after Litespeed is restarted the load balancer begins reporting the error "Read failed: Connection reset by peer" for requests to that server and sometimes the Simple HTTP monitor picks up on it too. During each event the issue is present in one or more of our web servers. Error logs and other metrics to check the health of each server indicate there are no problems.
The other night I switched all but one of our web servers to Litespeed 4.0.20. The only server still showing this issue is the server running 4.1. The rest have stopped giving us problems.
Can you help me try and track this problem down? As stated previously, the error logs indicate zero issues even with debugging enabled.
As you may know, I run a CentOS-based web cluster utilizing Litespeed on all of our backend nodes. A couple weeks ago I upgraded all of our Litespeed servers from 4.0.19 to 4.1.
First, the minor issue - 4.1 keeps informing me that a new version is available (4.0.20).
Next, the major issue - Around the same time as the upgrade all of our web servers began throwing back errors to our load balancers at random. I have spent the past two weeks tracking this issue, but I can't seem to identify the cause.
What happens is that a few hours or even a day after Litespeed is restarted the load balancer begins reporting the error "Read failed: Connection reset by peer" for requests to that server and sometimes the Simple HTTP monitor picks up on it too. During each event the issue is present in one or more of our web servers. Error logs and other metrics to check the health of each server indicate there are no problems.
The other night I switched all but one of our web servers to Litespeed 4.0.20. The only server still showing this issue is the server running 4.1. The rest have stopped giving us problems.
Can you help me try and track this problem down? As stated previously, the error logs indicate zero issues even with debugging enabled.
Last edited: