View previous topic :: View next topic |
Author |
Message |
Lanthar D'Alton
Joined: 10 Feb 2005 Posts: 100
|
Posted: Mon Aug 08, 2005 2:01 Post subject: Fatal Exceptions in nwserver (probably not nwnx) |
|
|
I realize this is unrelated to nwnx.exe, but I've seen a long standing series of fatal exceptions with the message:
Application popup: NWServer - Lanthar's Lair 08.04CEP15 - 9/64 3:13:14: nwserver.exe - Application Error : The instruction at "0x004201d6" referenced memory at "0x306f23a2". The memory could not be "read".
Note the bolded part. That hex address is in nearly 90% of the crashes that happen with my server, and I was wondering if that same address corresponded to crashes on any other servers. To review your crash error messages: control panel->administrative tools->event viewer then click System, then sort by source and look through the Application Popup errors. If anyone else is hitting errors in their server I'd like to know, so that I can persuade bioware to follow up on a memory address error. I bring it up here because it's the best concentration of persistent world admins and programmers that I know of, and I can reasonably expect some of you to know what I'm talking about...
Then again, if that's the address of the SetLocalString function, then maybe it is nwnx... anyway, anyone else seeing this?
-Lanthar |
|
Back to top |
|
|
Lanthar D'Alton
Joined: 10 Feb 2005 Posts: 100
|
Posted: Wed Aug 10, 2005 9:52 Post subject: bump |
|
|
cmon. nobody is willing to go peruse their windows event log for 5 minutes to see if the function crashes on them too? Just for fun I debugged into nwserver.exe and got the assembly code:
too bad I can't read assembly (had a course conflict with the assembly class in college... much to my chagrin... really would be useful when debugging)... so can anyone make any sense of what that is doing? The operation at 4201d6 is movzx... anyone care to go look at the rest of this and try to figure out what this function is doing? (doubting anyone is bored enough to try reading assembly)
Code: |
...
004201D2 mov ecx,dword ptr [esp+20h]
004201D6 movzx ax,byte ptr [ecx+3]
004201DB movzx cx,byte ptr [ecx+4]
004201E0 shl eax,8
004201E3 add eax,ecx
004201E5 mov edx,dword ptr [esp+18h]
004201E9 mov ecx,dword ptr [esi+8F8h]
004201EF inc ebx
004201F0 add edx,4
004201F3 add edi,2
004201F6 cmp ebx,ecx
004201F8 mov dword ptr [esp+18h],edx
004201FC jb 00420181
004201FE mov ebx,1
00420203 cmp ebp,ebx
00420205 jne 0042024F
00420207 mov edi,dword ptr [esi+904h]
0042020D inc edi
0042020E mov ecx,edi
00420210 cmp ecx,2
00420213 mov dword ptr [esi+904h],edi
00420219 mov dword ptr [esi+900h],0
...
|
|
|
Back to top |
|
|
Acrodania
Joined: 02 Jan 2005 Posts: 208
|
Posted: Wed Aug 10, 2005 16:45 Post subject: |
|
|
If the Hex address stays the same you probably have a bad stick of memory, or a controller that is weak (heat damage, power spike, etc). Just to make sure, is there someone else that you can have run your module for a while?
Things to try:
1) Take out one stick of ram (if you have multiples). If it still happens, put it back in and take out the other. Keep swapping until all have been cycled out.
2) Slow your system down. Use the the bios settings to lower your BUS speeds. That will take the strain off anything that is damaged. Especially helpful if your system is overclocked
3) Its also possible that it could be caused by a piece of spyware. Make sure the system is clean.....
I haven't seen any crashes with NWNX current or patch 1.66 recently. My previous string of problems turned out to be a bad CPU (probably heat-damaged) that only crashed with NWNX 2.6.1 Replaced it and everything's happy..... |
|
Back to top |
|
|
pdwalker
Joined: 09 Aug 2005 Posts: 22
|
Posted: Wed Aug 10, 2005 19:37 Post subject: |
|
|
It is definately not a hardware memory error.
The error reads like the nwserver is trying to reference an object that either no longer exists, or access memory it does not own.
Do you have a version of your mod from the time when it did not have this error? |
|
Back to top |
|
|
Lanthar D'Alton
Joined: 10 Feb 2005 Posts: 100
|
Posted: Thu Aug 11, 2005 1:09 Post subject: re exception |
|
|
Definitely not hardware. Being a c++ programmer I can guarantee that. When I managed to catch the code mid execution across that memory address it looked like those two lines were null terminating the memory at those pointers... or at the very least, had just copied an empty string over them. ( i.e. strcpy(currvar, ""); ) but more like:
char * pszBlah=new char[10];
char szCurrvar[20
delete[] pszBlah;
strcpy(szCurrvar, pszBlah);
of course, if that's the cause, I suppose there's not much I can do. Mind you though, it only happened when someone randomly decided to join my test server, and only the first time they joined. Ah well... guess nobody but maybe papillon on here is very literate with assembly... (and I only say that b/c nwnx has a nice assembly coded chunk in the old version I first looked at)...
Guess I should get a book...
Still somewhat disappointed that no other admins have checked their server event logs yet... |
|
Back to top |
|
|
Acrodania
Joined: 02 Jan 2005 Posts: 208
|
Posted: Thu Aug 11, 2005 2:14 Post subject: Re: re exception |
|
|
Lanthar D'Alton wrote: |
Still somewhat disappointed that no other admins have checked their server event logs yet... |
My windows system logs show no errors.... |
|
Back to top |
|
|
pdwalker
Joined: 09 Aug 2005 Posts: 22
|
Posted: Thu Aug 11, 2005 16:11 Post subject: |
|
|
No errors in my logs either.
If you want a primer on assembly language, check out "The Art of Assembly Language" by Randall Hyde. See here http://webster.cs.ucr.edu/AoA/DOS/AoADosIndex.html for more details with the online version, or see here for other options: http://webster.cs.ucr.edu/AoA/index.html
The offending instruction (if I remember my asm correctly) is saying is take a byte and store it in the AX register after zero'ing it. Find the byte pointed to the address contained in the ex register offset by 3 bytes.
Which really isnt all that helpful since we do not have the higher level context of what is actually happening and why it is faulting.
Is your problem repeatable? If so, can you isolate the problem to where it happens? Can you reduce it to the smallest test case possible where you can repeat the exact problem?
Do you have a version of your mod where this does not happen regularily? If so, find the differences between your test case and your mod to isolate where the problem is occuring and work from there.
Spending time tracing the assembly is not likely to be all that productive compare to the above option.
- Paul |
|
Back to top |
|
|
Lanthar D'Alton
Joined: 10 Feb 2005 Posts: 100
|
Posted: Fri Aug 12, 2005 4:54 Post subject: Repeatability on small scale |
|
|
Having put a breakpoint at that instruction, it only got called when some random person joined my empty test server (oddly enough, especially since it's named Lanthar's Test Lair and the module was Talus Speech Setup. Sounds exciting right?) Anyway, it only fired during the first time he joined... Another random player also joined and it didn't fire when they did. Anyway, seeing how rarely this instruction gets called... it makes it difficult to find. It'd take only a few minutes for bioware to read the c++ and see what is behind it. Isolating things out of my 150 meg module would be absurdly difficult... far more so than reading the assembly imho... (in direct opposition to your opinion pdwalker)
Oh well. Guess since nobody else getting server crashes is reading these forums or cares enough to look at their server logs...
Probably easiest to just let this thread die, and let nwnx handle reloading it.
-Lanthar |
|
Back to top |
|
|
Papillon x-man
Joined: 28 Dec 2004 Posts: 1060 Location: Germany
|
Posted: Fri Aug 12, 2005 9:28 Post subject: |
|
|
Reading the assembly language will probably not get you anywhere. I checked the referenced address and I have never stumbled upon that function before. I know it is beeing called by the network layer while updating some status, so the crash while a player is joining seems to be plausible. The error needs to be reproduceable before you can start debugging the server - and even then, you will have no way of fixing the bug.
I had a similar error once, where assigning a bird animation to an NPC (can't remember the exact context) resulted in a crashed nwserver. I sent Bioware a demo module and they fixed the crash. There is really nothing more you can do at that point... _________________ Papillon |
|
Back to top |
|
|
Lanthar D'Alton
Joined: 10 Feb 2005 Posts: 100
|
Posted: Fri Aug 19, 2005 0:41 Post subject: ... |
|
|
Guess I'll review all of the scripts that I run when a player logs in on my server... :/ I wonder if possibly it involves the listener... Guess I'll review it's code for first time logins too...
Thanks Papillon... that's probably all there is to be discovered from the info I have. I hate rare and unpredictable exceptions. Maybe I'll reinstall visual studio on my main server and have it there to debug the next crash to see if it's in my code.
Then again, no one has reported this yet regarding the listener.
Guess I'll let you all know if I figure it out.
-Lanthar |
|
Back to top |
|
|
isthar
Joined: 26 Oct 2005 Posts: 8
|
Posted: Sat Nov 26, 2005 12:54 Post subject: |
|
|
I have the same problem, with the same address memory error.
My module is up 24hours (it is not a test module) and it crashes about 1 time every day (expecially with more thar 10 players online)
I use CEP, PRC, HCR, DMFI WANDS, but not Talus listener and NWNX to launch NWSERVER.
I cannot understand what is the thing that create the error nor i am able to resolve issue.
Have you manage to understand the cause of your problem???
Anybody may help me??? |
|
Back to top |
|
|
Lanthar D'Alton
Joined: 10 Feb 2005 Posts: 100
|
Posted: Tue Dec 06, 2005 6:06 Post subject: Well, that's good and bad. |
|
|
1: it's not HCR (I don't use that)
2: it's not Talus Listener (You don't use that)
3: it's not PRC (I don't use that)
4: probably not DMFI. could be, but I doubt it.
Do you use any other systems? I do have my customized ATS Player Vendor thing too... I take it you use PRC for your subraces? I do my own by scripting... |
|
Back to top |
|
|
isthar
Joined: 26 Oct 2005 Posts: 8
|
Posted: Thu Dec 08, 2005 18:03 Post subject: |
|
|
I use CNR (for pg jobs) and i use bioware vendors.
I use PRC for subraces.
In my PW (i am italian) the graphic guy is the creator of the NIC horses , so we have (modified) NIC horses, and other rideable animals (Crabs, Raptors, and others)
We have MANY custom contents, many and large haks.
The incredible thing is that when mymodule crashes, many pg lose hours of game.
I use every 5 minutes exportsinglecharacter to save players bics. If i close manually server , bic are saved, but if it crashes with this crash, bic of many players are not saved.
It is very strange. |
|
Back to top |
|
|
isthar
Joined: 26 Oct 2005 Posts: 8
|
Posted: Fri Dec 30, 2005 16:25 Post subject: |
|
|
i try to change persistence from nwnx2 to nwn-ff and crashes continues, so it is not a nwnx2 problem. |
|
Back to top |
|
|
pdwalker
Joined: 09 Aug 2005 Posts: 22
|
Posted: Fri Dec 30, 2005 22:44 Post subject: |
|
|
just a suggestion unrelated to nwnx2.
I've got a 140 MB mod that sometimes misbehaves when it somehow gets a "bad object" inside the mod file.
When that happens, I do the following
- open the mod
- delete all the *.ncs files in the temp0 directory
- delete all the *.ndb files in the temp0 directory (the compiled with debug nwscript files, I think that is the extension - not seen any in a while)
- do a full mod rebuild with all options checked (and then go away for a few hours.
- correct every error (except unused critters, scripts, placeables) that appears
- rebuild the mod and correct errors until a full rebuild shows no errors.
Also, in some circumstances, an object can get into a mod that is somehow somewhat invalid. When these occur, strange things happen, like mod resets, nwserver crashes, invisible objects in characters inventories that take up space but cannot be seen or moved.
The best thing I can suggest in this case is to see if you can reduce it to the smallest case you can (e.g., access this chest, and boom, or make this item using cnr and boom, or enter area, or something.) that reproduces the problem.
Another suggestion is to reload your mod more than once a day. It doesnt solve your problem, but it might hide it long enough to be not noticed.
- Paul |
|
Back to top |
|
|
|