logo logo

 Back to main page

The NWNX Community Forum

 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Resman stability and dynamic .jrl file
Goto page Previous  1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic    nwnx.org Forum Index -> Linux development
View previous topic :: View next topic  
Author Message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Sat Sep 06, 2014 19:20    Post subject: Reply with quote

leo_x wrote:
There's something definitely wrong with what's in the repo at this moment. I was getting double frees at shutdown myself. Not had much chance to work on it tho...

As for the truncated character, I did see that issue before. I think it's even in the orignal resman: https://github.com/NWNX/nwnx2-linux/blob/master/plugins/resman/NWNXResMan.cpp#L84 could be wrong about that tho. If not, I probably copied it over. In this new stuff I was just going to say screw it and use std::string.


Not sure why that line would fail. 20 is enough to hold resref + . + extension.

The extension gets written properly in any case, it's the ref that is being cut off. (resRef before the dot). So presumably it's being passed in wrongly into DemandRes ..?

The current workaround of disallowing 16-char resrefs is fugly, but not as bad as the crash bug for updated ncs. :D
Back to top
View user's profile Send private message
leo_x



Joined: 25 Aug 2010
Posts: 75

PostPosted: Sat Sep 06, 2014 20:02    Post subject: Reply with quote

I haven't been able to test it but this is how I would think it would be fixed: https://github.com/jd28/nwnx2-linux/commit/034d819f6299dbbbae23fce2adfba97036fa9d65?diff=split Sorry, have been working a ton on my PW and just haven't had the time for this... hopefully, you get frustrated enough to restart the replacement nwserver project. Very Happy

The doc for snprintf:
Quote:

n
Maximum number of bytes to be used in the buffer.
The generated string has a length of at most n-1, leaving space for the additional terminating null character.
size_t is an unsigned integral type.


Seems to me both what I did and what is in the resman now is off by one.
_________________
the awakening (PW Action)
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Sat Sep 06, 2014 20:51    Post subject: Reply with quote

This didnt fix it. It ate the extension (resrefWithExt was only the part before the dot in the logs). But your excerpt made me find the actual issue - you were right, of course.

Simply changing the buffers in Exists and Demand to be 21 chars long, and passing 17 into snprintf, fixed it.
Back to top
View user's profile Send private message
virusman



Joined: 30 Jan 2005
Posts: 1020
Location: Russia

PostPosted: Sat Sep 06, 2014 21:34    Post subject: Reply with quote

Yep, 21 is 16 chars name + 1 '.' + 3 chars extension + 1 0x0 terminating byte to mark end of the string. Whoever wrote that part (dumbo or me, presumably) forgot about the terminating byte.
_________________
In Soviet Russia, NWN plays you!
Back to top
View user's profile Send private message Visit poster's website Yahoo Messenger
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Sun Sep 07, 2014 18:35    Post subject: Reply with quote

Alright, this is weirding me out.

I've recompiled nwnx + plugins with debug symbols, and the backtrace says it segfaults here:

Code:
#0  0xe2ab3667 in raise () from /lib/i386-linux-gnu/libc.so.6
#1  0xe2ab6a52 in abort () from /lib/i386-linux-gnu/libc.so.6
#2  0xe2aef98d in ?? () from /lib/i386-linux-gnu/libc.so.6
#3  0xe2af9a8a in ?? () from /lib/i386-linux-gnu/libc.so.6
#4  0xe2afb2e8 in ?? () from /lib/i386-linux-gnu/libc.so.6
#5  0xe2afe46d in free () from /lib/i386-linux-gnu/libc.so.6
#6  0xe19a428d in CNWNXResources::DemandRes (this=0xe19ea580, pResMan=0xe0618dd0, cRes=0x12c34538, resRef=...,
    resType=NwnResType_NCS) at /Users/elven/code/nwnx2-linux/plugins/resources/NWNXResources.cpp:175



which is the free in this block:

Code:
if(cRes->m_pResource)
            {
                free(cRes->m_pResource);
                pResMan->TotalAvailableMemory += cRes->m_nSize;
            }


Edit: last log entry in nwnx_resources.txt with a debuglevel=4

FYI, "redislock" is a test script by me that locks the whole server in a script until all scripts have been synced in physfs. Didn't help, and I think the crash is unrelated to updating bytecode itself ..

Code:
[Sun Sep  7 16:39:44 2014] Calling CRes*::OnResourceServiced: 0826a784
[Sun Sep  7 16:39:44 2014] Resulting Structure:
[Sun Sep  7 16:39:44 2014] - m_nDemands = 1
[Sun Sep  7 16:39:44 2014] - m_nRequests = 0
[Sun Sep  7 16:39:44 2014] - m_nID = 00400A10
[Sun Sep  7 16:39:44 2014] - m_pResource = 11e69718
[Sun Sep  7 16:39:44 2014] - m_nSize = 1164
[Sun Sep  7 16:39:44 2014] - m_sName = x2_def_heartbeat
[Sun Sep  7 16:39:44 2014] - flags = 4
[Sun Sep  7 16:39:44 2014] - m_status = 2
[Sun Sep  7 16:39:44 2014] - m_pKeyEntry = df24f39c
[Sun Sep  7 16:39:44 2014] File: nw_c2_default1.ncs, Exists?: 1, mtime: 1410107322
[Sun Sep  7 16:39:44 2014] Skipping nw_c2_default1.ncs... Data: 0x11f008d8
[Sun Sep  7 16:39:44 2014] Calling CRes*::OnResourceServiced: 0826a784
[Sun Sep  7 16:39:44 2014] Resulting Structure:
[Sun Sep  7 16:39:44 2014] - m_nDemands = 1
[Sun Sep  7 16:39:44 2014] - m_nRequests = 0
[Sun Sep  7 16:39:44 2014] - m_nID = 0000062B
[Sun Sep  7 16:39:44 2014] - m_pResource = 11f008d8
[Sun Sep  7 16:39:44 2014] - m_nSize = 90390
[Sun Sep  7 16:39:44 2014] - m_sName = nw_c2_default1
[Sun Sep  7 16:39:44 2014] - flags = 4
[Sun Sep  7 16:39:44 2014] - m_status = 2
[Sun Sep  7 16:39:44 2014] - m_pKeyEntry = df2fa314
[Sun Sep  7 16:39:44 2014] File: redislock.ncs, Exists?: 1, mtime: 1410107322
[Sun Sep  7 16:39:44 2014] ID: -1: redislock
[Sun Sep  7 16:39:44 2014] Skipping redislock.ncs... Data: 0x11e3a940
[Sun Sep  7 16:39:44 2014] Calling CRes*::OnResourceServiced: 0826a784
[Sun Sep  7 16:39:44 2014] Resulting Structure:
[Sun Sep  7 16:39:44 2014] - m_nDemands = 1
[Sun Sep  7 16:39:44 2014] - m_nRequests = 0
[Sun Sep  7 16:39:44 2014] - m_nID = 00000000
[Sun Sep  7 16:39:44 2014] - m_pResource = 11e3a940
[Sun Sep  7 16:39:44 2014] - m_nSize = 2855
[Sun Sep  7 16:39:44 2014] - flags = 4
[Sun Sep  7 16:39:44 2014] - m_status = 2
[Sun Sep  7 16:39:44 2014] - m_pKeyEntry = 00000000
[Sun Sep  7 16:39:44 2014] File: msg_cc.ncs, Exists?: 0, mtime: 0
[Sun Sep  7 16:39:44 2014] ID: -1: msg_cc
[Sun Sep  7 16:39:44 2014] File: nw_c2_default9.ncs, Exists?: 1, mtime: 1410107968
[Sun Sep  7 16:39:44 2014] Loading external resouce: nw_c2_default9.ncs, mtime: 1410107968
[Sun Sep  7 16:39:44 2014] Original Structure:
[Sun Sep  7 16:39:44 2014] - m_nDemands = 0
[Sun Sep  7 16:39:44 2014] - m_nRequests = 0
[Sun Sep  7 16:39:44 2014] - m_nID = 00000638
[Sun Sep  7 16:39:44 2014] - m_pResource = 11d4f048
[Sun Sep  7 16:39:44 2014] - m_nSize = 24210
[Sun Sep  7 16:39:44 2014] - m_sName = nw_c2_default9
[Sun Sep  7 16:39:44 2014] - flags = 104
[Sun Sep  7 16:39:44 2014] - m_status = 2
[Sun Sep  7 16:39:44 2014] - m_pKeyEntry = df3ec984
[Sun Sep  7 16:39:44 2014] Free memory: 267286950


And nwnx_physfs, where all my ncs scripts come from:

Code:

[Sun Sep  7 16:39:44 2014] File: redislock.ncs exists 1
[Sun Sep  7 16:39:44 2014] File: msg_cc.ncs exists 0
[Sun Sep  7 16:39:44 2014] Unable to service file: msg_cc.ncs, Exists?: 0, Required mtime: 0, Our mtime: -1
[Sun Sep  7 16:39:44 2014] File: nw_c2_default9.ncs exists 1
[Sun Sep  7 16:39:44 2014] Read: nw_c2_default9.ncs : 24210 bytes, Last Mod Time: %


So I guess this conks out because nw_c2_default9.ncs was updated (only the timestamp though, the file data is binary-identical).

Any ideas what's up with that ..?
Back to top
View user's profile Send private message
leo_x



Joined: 25 Aug 2010
Posts: 75

PostPosted: Sun Sep 07, 2014 21:29    Post subject: Reply with quote

This file is only in physfs or is it in the module as well? It looks like it is in the module too? If the former I don't see why you wouldn't have a fresh CRes every time and there should never be anything to free... If the latter, maybe the cache and this snippet are both attempting to free the data?

edit: Wondering if this issue isn't here: https://github.com/jd28/nwnx2-linux/blob/ta/plugins/resources/NWNXResources.cpp#L150

I'm guessing that that and this: https://github.com/jd28/nwnx2-linux/blob/ta/plugins/resources/NWNXResources.cpp#L175 point to the same thing
_________________
the awakening (PW Action)
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Sun Sep 07, 2014 22:46    Post subject: Reply with quote

leo_x wrote:
This file is only in physfs or is it in the module as well? It looks like it is in the module too? If the former I don't see why you wouldn't have a fresh CRes every time and there should never be anything to free... If the latter, maybe the cache and this snippet are both attempting to free the data?

edit: Wondering if this issue isn't here: https://github.com/jd28/nwnx2-linux/blob/ta/plugins/resources/NWNXResources.cpp#L150

I'm guessing that that and this: https://github.com/jd28/nwnx2-linux/blob/ta/plugins/resources/NWNXResources.cpp#L175 point to the same thing


The file is only in physfs & in the core dist.

I've added some NULLing of those fields after free()ing them to test that theory (couldn't be bothered to figure out how to print fields in gdb coredumps so trial by fire). We'll see if it still crashes. :)
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Mon Sep 08, 2014 0:55    Post subject: Reply with quote

No double frees yet. Pretty sure that was it!
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Thu Sep 11, 2014 17:30    Post subject: Reply with quote

Had just another crash, another dblfree at the same call. No joy.

Edit: Now being a bad person and running libdiehard (which catches double frees, among other things) just to make it not crash in prod.
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Thu Sep 25, 2014 18:36    Post subject: Reply with quote

Hi,

still seeing these crashes. Got any news on this?
Back to top
View user's profile Send private message
leo_x



Joined: 25 Aug 2010
Posts: 75

PostPosted: Fri Sep 26, 2014 12:43    Post subject: Reply with quote

I'll have some time to work on it this weekend.
_________________
the awakening (PW Action)
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Fri Sep 26, 2014 14:50    Post subject: Reply with quote

Alright, I've commented out the second free in line 169 here:

Code:
  if(cRes->m_pResource)
            {
                // free(cRes->m_pResource);
                // cRes->m_pResource = 0;
                pResMan->TotalAvailableMemory += cRes->m_nSize;
            }



This obivously fixes the crashes. I haven't seen any memleaking yet but there isn't much going on on the server atm.

The segfault only happens when script files are updated, and apparently the pointer is already free'd by then (otherwise it wouldn't sigsegv there, I guess).

Code:

(gdb) print fileInfo
$1 = (CResFileInfo &) @0x13536690: {size = 90346, mtime = 1411723269, latest_mtime = 1411723269, data = 0x0, res = 0x1361a4d8}
(gdb) print cRes
$2 = (CRes *) 0x1361a4d8
(gdb) print cRes->m_pResource
$3 = (void *) 0x13734d80


Unfortunately, fileInfo->data is already NULL at this point in this particular core, because I've done that to test. :) I'll rework it and go see if they're the same - if so, that'd explain the double free and I probably would have to NULL both fields anyways.
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Fri Sep 26, 2014 14:56    Post subject: Reply with quote

Alright, that was an easy fix I think. I've just compared fields and apparently it really was just a simple double free - they always point to the same address anyways. You were right all along and I was just not doing things quite correctly.

Removing the second free seems to work since it's already free'd in free(fileInfo.data);
Back to top
View user's profile Send private message
leo_x



Joined: 25 Aug 2010
Posts: 75

PostPosted: Fri Sep 26, 2014 15:53    Post subject: Reply with quote

Glad to hear you got it working! I wonder if there might be a leak because fileInfo.data won't point to the data that NWN ResMan owns (because it loaded before it was in physfs or whatever) but I forget what my thought process was there other than trying to deal with that.

Maybe fileInfo.data should only be freed if and only if the CRes struct is not indexed.
_________________
the awakening (PW Action)
Back to top
View user's profile Send private message
elven



Joined: 28 Jul 2006
Posts: 259
Location: Germany

PostPosted: Fri Sep 26, 2014 16:14    Post subject: Reply with quote

leo_x wrote:
Glad to hear you got it working! I wonder if there might be a leak because fileInfo.data won't point to the data that NWN ResMan owns (because it loaded before it was in physfs or whatever) but I forget what my thought process was there other than trying to deal with that.

Maybe fileInfo.data should only be freed if and only if the CRes struct is not indexed.


Well, in all the extensive testing I've done (start once, reload some scripts, watch output), pData always pointed to the same address as cRes->pwhatever.

In cases where no re-loads happened yet, both were NULL.

I'm not sure, would it ever point somewhere else?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    nwnx.org Forum Index -> Linux development All times are GMT + 2 Hours
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 3 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group