View previous topic :: View next topic |
Author |
Message |
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Sat Sep 06, 2014 19:20 Post subject: |
|
|
leo_x wrote: | There's something definitely wrong with what's in the repo at this moment. I was getting double frees at shutdown myself. Not had much chance to work on it tho...
As for the truncated character, I did see that issue before. I think it's even in the orignal resman: https://github.com/NWNX/nwnx2-linux/blob/master/plugins/resman/NWNXResMan.cpp#L84 could be wrong about that tho. If not, I probably copied it over. In this new stuff I was just going to say screw it and use std::string. |
Not sure why that line would fail. 20 is enough to hold resref + . + extension.
The extension gets written properly in any case, it's the ref that is being cut off. (resRef before the dot). So presumably it's being passed in wrongly into DemandRes ..?
The current workaround of disallowing 16-char resrefs is fugly, but not as bad as the crash bug for updated ncs. :D |
|
Back to top |
|
|
leo_x
Joined: 25 Aug 2010 Posts: 75
|
Posted: Sat Sep 06, 2014 20:02 Post subject: |
|
|
I haven't been able to test it but this is how I would think it would be fixed: https://github.com/jd28/nwnx2-linux/commit/034d819f6299dbbbae23fce2adfba97036fa9d65?diff=split Sorry, have been working a ton on my PW and just haven't had the time for this... hopefully, you get frustrated enough to restart the replacement nwserver project.
The doc for snprintf:
Quote: |
n
Maximum number of bytes to be used in the buffer.
The generated string has a length of at most n-1, leaving space for the additional terminating null character.
size_t is an unsigned integral type.
|
Seems to me both what I did and what is in the resman now is off by one. _________________ the awakening (PW Action) |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Sat Sep 06, 2014 20:51 Post subject: |
|
|
This didnt fix it. It ate the extension (resrefWithExt was only the part before the dot in the logs). But your excerpt made me find the actual issue - you were right, of course.
Simply changing the buffers in Exists and Demand to be 21 chars long, and passing 17 into snprintf, fixed it. |
|
Back to top |
|
|
virusman
Joined: 30 Jan 2005 Posts: 1020 Location: Russia
|
Posted: Sat Sep 06, 2014 21:34 Post subject: |
|
|
Yep, 21 is 16 chars name + 1 '.' + 3 chars extension + 1 0x0 terminating byte to mark end of the string. Whoever wrote that part (dumbo or me, presumably) forgot about the terminating byte. _________________ In Soviet Russia, NWN plays you! |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Sun Sep 07, 2014 18:35 Post subject: |
|
|
Alright, this is weirding me out.
I've recompiled nwnx + plugins with debug symbols, and the backtrace says it segfaults here:
Code: | #0 0xe2ab3667 in raise () from /lib/i386-linux-gnu/libc.so.6
#1 0xe2ab6a52 in abort () from /lib/i386-linux-gnu/libc.so.6
#2 0xe2aef98d in ?? () from /lib/i386-linux-gnu/libc.so.6
#3 0xe2af9a8a in ?? () from /lib/i386-linux-gnu/libc.so.6
#4 0xe2afb2e8 in ?? () from /lib/i386-linux-gnu/libc.so.6
#5 0xe2afe46d in free () from /lib/i386-linux-gnu/libc.so.6
#6 0xe19a428d in CNWNXResources::DemandRes (this=0xe19ea580, pResMan=0xe0618dd0, cRes=0x12c34538, resRef=...,
resType=NwnResType_NCS) at /Users/elven/code/nwnx2-linux/plugins/resources/NWNXResources.cpp:175
|
which is the free in this block:
Code: | if(cRes->m_pResource)
{
free(cRes->m_pResource);
pResMan->TotalAvailableMemory += cRes->m_nSize;
}
|
Edit: last log entry in nwnx_resources.txt with a debuglevel=4
FYI, "redislock" is a test script by me that locks the whole server in a script until all scripts have been synced in physfs. Didn't help, and I think the crash is unrelated to updating bytecode itself ..
Code: | [Sun Sep 7 16:39:44 2014] Calling CRes*::OnResourceServiced: 0826a784
[Sun Sep 7 16:39:44 2014] Resulting Structure:
[Sun Sep 7 16:39:44 2014] - m_nDemands = 1
[Sun Sep 7 16:39:44 2014] - m_nRequests = 0
[Sun Sep 7 16:39:44 2014] - m_nID = 00400A10
[Sun Sep 7 16:39:44 2014] - m_pResource = 11e69718
[Sun Sep 7 16:39:44 2014] - m_nSize = 1164
[Sun Sep 7 16:39:44 2014] - m_sName = x2_def_heartbeat
[Sun Sep 7 16:39:44 2014] - flags = 4
[Sun Sep 7 16:39:44 2014] - m_status = 2
[Sun Sep 7 16:39:44 2014] - m_pKeyEntry = df24f39c
[Sun Sep 7 16:39:44 2014] File: nw_c2_default1.ncs, Exists?: 1, mtime: 1410107322
[Sun Sep 7 16:39:44 2014] Skipping nw_c2_default1.ncs... Data: 0x11f008d8
[Sun Sep 7 16:39:44 2014] Calling CRes*::OnResourceServiced: 0826a784
[Sun Sep 7 16:39:44 2014] Resulting Structure:
[Sun Sep 7 16:39:44 2014] - m_nDemands = 1
[Sun Sep 7 16:39:44 2014] - m_nRequests = 0
[Sun Sep 7 16:39:44 2014] - m_nID = 0000062B
[Sun Sep 7 16:39:44 2014] - m_pResource = 11f008d8
[Sun Sep 7 16:39:44 2014] - m_nSize = 90390
[Sun Sep 7 16:39:44 2014] - m_sName = nw_c2_default1
[Sun Sep 7 16:39:44 2014] - flags = 4
[Sun Sep 7 16:39:44 2014] - m_status = 2
[Sun Sep 7 16:39:44 2014] - m_pKeyEntry = df2fa314
[Sun Sep 7 16:39:44 2014] File: redislock.ncs, Exists?: 1, mtime: 1410107322
[Sun Sep 7 16:39:44 2014] ID: -1: redislock
[Sun Sep 7 16:39:44 2014] Skipping redislock.ncs... Data: 0x11e3a940
[Sun Sep 7 16:39:44 2014] Calling CRes*::OnResourceServiced: 0826a784
[Sun Sep 7 16:39:44 2014] Resulting Structure:
[Sun Sep 7 16:39:44 2014] - m_nDemands = 1
[Sun Sep 7 16:39:44 2014] - m_nRequests = 0
[Sun Sep 7 16:39:44 2014] - m_nID = 00000000
[Sun Sep 7 16:39:44 2014] - m_pResource = 11e3a940
[Sun Sep 7 16:39:44 2014] - m_nSize = 2855
[Sun Sep 7 16:39:44 2014] - flags = 4
[Sun Sep 7 16:39:44 2014] - m_status = 2
[Sun Sep 7 16:39:44 2014] - m_pKeyEntry = 00000000
[Sun Sep 7 16:39:44 2014] File: msg_cc.ncs, Exists?: 0, mtime: 0
[Sun Sep 7 16:39:44 2014] ID: -1: msg_cc
[Sun Sep 7 16:39:44 2014] File: nw_c2_default9.ncs, Exists?: 1, mtime: 1410107968
[Sun Sep 7 16:39:44 2014] Loading external resouce: nw_c2_default9.ncs, mtime: 1410107968
[Sun Sep 7 16:39:44 2014] Original Structure:
[Sun Sep 7 16:39:44 2014] - m_nDemands = 0
[Sun Sep 7 16:39:44 2014] - m_nRequests = 0
[Sun Sep 7 16:39:44 2014] - m_nID = 00000638
[Sun Sep 7 16:39:44 2014] - m_pResource = 11d4f048
[Sun Sep 7 16:39:44 2014] - m_nSize = 24210
[Sun Sep 7 16:39:44 2014] - m_sName = nw_c2_default9
[Sun Sep 7 16:39:44 2014] - flags = 104
[Sun Sep 7 16:39:44 2014] - m_status = 2
[Sun Sep 7 16:39:44 2014] - m_pKeyEntry = df3ec984
[Sun Sep 7 16:39:44 2014] Free memory: 267286950
|
And nwnx_physfs, where all my ncs scripts come from:
Code: |
[Sun Sep 7 16:39:44 2014] File: redislock.ncs exists 1
[Sun Sep 7 16:39:44 2014] File: msg_cc.ncs exists 0
[Sun Sep 7 16:39:44 2014] Unable to service file: msg_cc.ncs, Exists?: 0, Required mtime: 0, Our mtime: -1
[Sun Sep 7 16:39:44 2014] File: nw_c2_default9.ncs exists 1
[Sun Sep 7 16:39:44 2014] Read: nw_c2_default9.ncs : 24210 bytes, Last Mod Time: %
|
So I guess this conks out because nw_c2_default9.ncs was updated (only the timestamp though, the file data is binary-identical).
Any ideas what's up with that ..? |
|
Back to top |
|
|
leo_x
Joined: 25 Aug 2010 Posts: 75
|
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Sun Sep 07, 2014 22:46 Post subject: |
|
|
The file is only in physfs & in the core dist.
I've added some NULLing of those fields after free()ing them to test that theory (couldn't be bothered to figure out how to print fields in gdb coredumps so trial by fire). We'll see if it still crashes. :) |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Mon Sep 08, 2014 0:55 Post subject: |
|
|
No double frees yet. Pretty sure that was it! |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Thu Sep 11, 2014 17:30 Post subject: |
|
|
Had just another crash, another dblfree at the same call. No joy.
Edit: Now being a bad person and running libdiehard (which catches double frees, among other things) just to make it not crash in prod. |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Thu Sep 25, 2014 18:36 Post subject: |
|
|
Hi,
still seeing these crashes. Got any news on this? |
|
Back to top |
|
|
leo_x
Joined: 25 Aug 2010 Posts: 75
|
Posted: Fri Sep 26, 2014 12:43 Post subject: |
|
|
I'll have some time to work on it this weekend. _________________ the awakening (PW Action) |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Fri Sep 26, 2014 14:50 Post subject: |
|
|
Alright, I've commented out the second free in line 169 here:
Code: | if(cRes->m_pResource)
{
// free(cRes->m_pResource);
// cRes->m_pResource = 0;
pResMan->TotalAvailableMemory += cRes->m_nSize;
} |
This obivously fixes the crashes. I haven't seen any memleaking yet but there isn't much going on on the server atm.
The segfault only happens when script files are updated, and apparently the pointer is already free'd by then (otherwise it wouldn't sigsegv there, I guess).
Code: |
(gdb) print fileInfo
$1 = (CResFileInfo &) @0x13536690: {size = 90346, mtime = 1411723269, latest_mtime = 1411723269, data = 0x0, res = 0x1361a4d8}
(gdb) print cRes
$2 = (CRes *) 0x1361a4d8
(gdb) print cRes->m_pResource
$3 = (void *) 0x13734d80
|
Unfortunately, fileInfo->data is already NULL at this point in this particular core, because I've done that to test. :) I'll rework it and go see if they're the same - if so, that'd explain the double free and I probably would have to NULL both fields anyways. |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Fri Sep 26, 2014 14:56 Post subject: |
|
|
Alright, that was an easy fix I think. I've just compared fields and apparently it really was just a simple double free - they always point to the same address anyways. You were right all along and I was just not doing things quite correctly.
Removing the second free seems to work since it's already free'd in free(fileInfo.data); |
|
Back to top |
|
|
leo_x
Joined: 25 Aug 2010 Posts: 75
|
Posted: Fri Sep 26, 2014 15:53 Post subject: |
|
|
Glad to hear you got it working! I wonder if there might be a leak because fileInfo.data won't point to the data that NWN ResMan owns (because it loaded before it was in physfs or whatever) but I forget what my thought process was there other than trying to deal with that.
Maybe fileInfo.data should only be freed if and only if the CRes struct is not indexed. _________________ the awakening (PW Action) |
|
Back to top |
|
|
elven
Joined: 28 Jul 2006 Posts: 259 Location: Germany
|
Posted: Fri Sep 26, 2014 16:14 Post subject: |
|
|
leo_x wrote: | Glad to hear you got it working! I wonder if there might be a leak because fileInfo.data won't point to the data that NWN ResMan owns (because it loaded before it was in physfs or whatever) but I forget what my thought process was there other than trying to deal with that.
Maybe fileInfo.data should only be freed if and only if the CRes struct is not indexed. |
Well, in all the extensive testing I've done (start once, reload some scripts, watch output), pData always pointed to the same address as cRes->pwhatever.
In cases where no re-loads happened yet, both were NULL.
I'm not sure, would it ever point somewhere else? |
|
Back to top |
|
|
|