View previous topic :: View next topic |
Author |
Message |
Calvinthesneak
Joined: 15 Nov 2010 Posts: 14
|
Posted: Mon Nov 15, 2010 1:12 Post subject: watchdog.sh |
|
|
Hi all... I've been having issues with nwserver hanging (probably due to module issues). Anyhow I've been playing with a watchdog process.
Anyone have any constructive feedback on this or suggestions on how I should improve it?
Code: |
#!/bin/bash
#NWServer Process Monitor
#Go to the right directory
NWN_DIR="/mynwndirectory"
cd $NWN_DIR
# Command to start server
START="/mynwnfolderpath/nwn/nwservctl.sh start"
#pgrep command
PGREP="/usr/bin/pgrep"
#daemon name
HTTPD="nwserver"
#variable to hold cpu usage
PEG1=$(top -b -n1 -d3 | grep nwserver | cut -c42-45 | awk '{print $0}')
#create variable to count times server isn't working
OVERLOAD=0
#command to kill the nwserver so it can start again
KILL="killall -9 nwserver"
$PGREP ${HTTPD}
if [ $? -ne 0 ] #if there's nothing running
then #start server
$START
else
#checks for cpu usage should go here......
if [ "$PEG1" -gt "90" ]
then
echo "$(top -b -n1 -d3 | grep nwserver | cut -c42-45 | awk '{print $0}')"
echo "Ok trouble detected, wait 30 secs and check again."
sleep 30
if [ "$(top -b -n1 -d3 | grep nwserver | cut -c42-45 | awk '{print $0}')" -gt "90" ]
then
echo "$(top -b -n1 -d3 | grep nwserver | cut -c42-45 | awk '{print $0}')"
echo "Two failed checks, certainly not good."
sleep 30
if [ "$(top -b -n1 -d3 | grep nwserver | cut -c42-45 | awk '{print $0}')" -gt "90" ]
then
echo "Ok three failed checks on CPU usage, time to restart."
$KILL
sleep 5
$START
fi
fi
fi
fi
|
minor edit to fix syntax mistake and add a bit more debugging.
Last edited by Calvinthesneak on Tue Nov 16, 2010 11:06; edited 1 time in total |
|
Back to top |
|
|
Ravine
Joined: 26 Jul 2006 Posts: 105
|
Posted: Mon Nov 15, 2010 10:44 Post subject: |
|
|
Hmm. Same happens to me sometimes. No idea what causing this, but i doubt it's module-related. Do you run your server in 'screen'? What version?
See this:
http://www.nwnx.org/phpBB2/viewtopic.php?t=1317
Maybe it's losing the STDIN? From screen, i can write to the console, but i receive no response. Too bad i can't easily reproduce this error, happens about once a week.
If this is the problem, running '-quiet' should solve the problem, but i need the console to communicate with the players... |
|
Back to top |
|
|
Calvinthesneak
Joined: 15 Nov 2010 Posts: 14
|
Posted: Mon Nov 15, 2010 16:25 Post subject: |
|
|
No sir, run the server via daemon, ala another shell script.
The script is this one here:
http://nwn.bioware.com/forums/viewcodepost.html?post=6367678
It's run in standard mode. Sometimes the nwserver process just hangs. It can be an hour after reboot, or a day. Nothing consistant about it, memory usage is something we're trying to monitor on it to see if there is a leak somewhere.
EDIT: I should note this correction to the script, unless you want to recompile your kernel.
Line 28 should be changed to the following:
CATCH_OUTPUT="empty -r -t 30 -b 8192 -i out.fifo" #8192 bytes is size of FIFO on linux |
|
Back to top |
|
|
Calvinthesneak
Joined: 15 Nov 2010 Posts: 14
|
Posted: Sat Nov 20, 2010 2:29 Post subject: |
|
|
No suggestions to make the script better then? |
|
Back to top |
|
|
Ravine
Joined: 26 Jul 2006 Posts: 105
|
Posted: Sun Mar 11, 2012 18:16 Post subject: |
|
|
Hi!
We are still running into this "server hang" bug from time to time. Anyone found a solution?
First i thought it's the stdin bug described here:
http://www.nwnx.org/phpBB2/viewtopic.php?t=1317
But looks like it's not. I'm running the server with -quiet, and still happens sometimes.
axs found something too:
http://www.nwnx.org/phpBB2/viewtopic.php?t=1314
And there's an early topic about this:
http://www.nwnx.org/phpBB2/viewtopic.php?t=236
This was mentioned in the old BW forums too (see Omnibus). Some admins claimed that this is caused by blindness effect conflict with truesee/ultravision effect, but that was many years ago.
This bug pisses me off, coz mostly happens when 4-5+ player playing on the server (which is a record novadays), and i'm not even nearby to restart it.
doh. |
|
Back to top |
|
|
leo_x
Joined: 25 Aug 2010 Posts: 75
|
Posted: Sun Mar 11, 2012 23:12 Post subject: |
|
|
If the hang is on reset have you tried
Code: |
/* Shut down the current process. If nForce is specified, the process will be
* force-killed in that number of seconds, in case it hangs during shutdown.
*/
void ShutdownServer (int nForce=0); |
in Acaos' nwnx_system plugin instead of the reset plugin? I had the hang on reset issue and this has overcome it. _________________ the awakening (PW Action) |
|
Back to top |
|
|
Ravine
Joined: 26 Jul 2006 Posts: 105
|
Posted: Wed Apr 18, 2012 17:14 Post subject: |
|
|
Doh. I deleted this post, the server just stopped working, so it's not the SCO/RCO |
|
Back to top |
|
|
|