View Issue Details

IDProjectCategoryView StatusLast Update
0011619Dwarf FortressTechnical -- Generalpublic2021-06-08 18:07
ReporterShirasik Assigned Tolethosor  
PrioritynormalSeveritycrashReproducibilityrandom
Status assignedResolutionopen 
PlatformPC desktopOSWindowsOS Version10 Pro x64 v2004
Product Version0.47.04 
Summary0011619: Random crash due to unhandled exception
DescriptionThe game crashes randomly and silently, w/o any error messages. Just - whoop! - found yourself at your desktop.
Steps To ReproduceLoad the save, unpause, wait for a few in-game days. That's random crash, after all.
Additional InformationFiles related to the report:
https://drive.google.com/drive/folders/1jumKHpE9z1dNVQoCzwYFlsmbNd9O7KWO?usp=sharing

Using PeridexisErrant's Starter Pack 0.47.04-r07.
Changes made to default settings of the pack:
- pop and invader caps raised (although wasn't hit yet);
- traffic route cost altered to [0:1:5:15] (although I didn't designate traffic zones in the save yet);
- seed caps raised to 6000 (6k) per plant species and 3000000 (3M) per fortress;
- item stacking in container aggressiveness raised to 1000.
Everything is packed along with the save in zip folder and uploaded to the google drive folder specified above.

The world have been generated in this version of DF as well as of the Pack. Fresh install, fresh everything.

In attempts to track the issue and figure out exact reproduction steps, I tried to narrow time window via save-load while the game still not crashed.
The first strange thing I noticed, it's that since I started to save-load, I passed the day of crash that I hit every time if I load from seasonal autosave and just play on.
The second strange thing is that at some point the game crashed during loading of the save, but! it loaded the save without a problem after I relaunched the game.

After some further attempts to figure reproduction steps via in-game actions, I ragequit, loaded the save, attached MS Visual Studio debugger to game and unpaused.

Finally, MSVS caught unhandled exception 0xC0000005 at address 0x00007FF675EC6780, access violation, as process attempts to read at address 0x0.
Plus, it said that the code is located in the ucrtbase.dll file.
As I don't have neither symbol files nor source code, I saved dumps if it may help: one with heap and another without heap. Both in the google drive folder specified above.
TagsNo tags attached.

Activities

lethosor

2020-09-08 18:05

manager   ~0040718

Last edited: 2020-09-08 18:08

Could you please upload just the relevant save folder separately? It's in the data/save folder, e.g. Dwarf Fortress/data/save/region1.

Does this crash occur with all utilities disabled? In particular, try disabling DFHack (from the PyLNP launcher, if you're using it) and see if the crash happens again.

Also, is ucrtbase.dll the third-party library you're referring to? If so, it's not actually a third-party library; it's supporting code that many programs rely on, and is essentially part of the operating system.

Shirasik

2020-09-08 22:42

reporter   ~0040719

Save folder only:
https://drive.google.com/file/d/1fiv2ZgT7t4bN6cWhjPqUFE57tLY9NnNs/view?usp=sharing

As for ucrtbase.dll file. Visual Studio specified just the file name, 'ucrtbase.dll', when it caught exception, not the full path. I checked DF's folder and found just the file with that name there. Now, after your words, I also checked C:\Windows and found the file at 'C:\Windows\System32\ucrtbase.dll'. Checked timestamps - DF's folder of the pack contains obsolete file (2015's) Downloaded vanilla DF - its own copy of obsolete version of this file is also there.
I'm not sure which one of the files should DF load at startup. Checked VS log - it says what debugger loads system's one when debugger attaches to DF.

As for further testing. I disabled DFhack via PyLNP launcher and tried again. Same exception, thread of the same file, slightly different address:
Exception thrown at 0x00007FF754E46780 in Dwarf Fortress.exe: 0xC0000005: Access violation reading location 0x0000000000000000.

Dump files have been uploaded here:
https://drive.google.com/drive/folders/1oE2oIots7BB4cKUMwhsrlYiot_1_8TUa?usp=sharing

lethosor

2020-09-09 17:22

manager   ~0040720

Oh, ok, I had a hard time figuring out exactly where ucrtbase.dll came from (as I'm not on Windows), but "crt" usually stands for "C runtime", so it's something that's provided by the compiler DF uses. In this case, DF distributes its own copy because any copies present on end-users' systems could be too old or too new to use with DF. DF is intended to use the one present in its own folder (so don't remove it), but it's something that DF needs at a low level, not something it was specifically written to use. I guess my point is that, barring any serious compiler bugs, ucrtbase.dll won't be the cause of any crashes, but DF bugs may end up triggering a crash in code within it.

Anyway, thanks for the save! I'll see if I can reproduce the crash. Sometimes these are system-specific, but having a save that crashes within a few days usually helps, so thanks for that.

Shirasik

2020-09-09 18:15

reporter   ~0040721

> too new to use with DF

Well, this, theoretically, might be the case. According to this
https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order
if DF requests DLLs via just filename ('ucrtbase.dll' in this case), Windows will never pick DF's own copy, because system's copy is in use since startup: it is used by system services and therefore is always already loaded whenever DF is launched.
Plus, I remembered that there were no that frequent crashes until some date.. i.e. until some of Windows updates. Before the date, DF was able to run fortress for dozens of ingame years without any technical issues.

I'll google if there a way to hijack DLL search order to force it to use DF's own copies for DF and to look how it will work out.

lethosor

2020-09-09 20:38

manager   ~0040722

Last edited: 2020-09-09 20:38

Unfortunately I wasn't able to reproduce the issue after 3 attempts, running for around an in-game month each time. It might be Windows-specific, though.

I'm no expert on how the Windows C runtime works, but from what I know about how DFHack works, I would expect an incompatibility like what you're describing to cause DF to crash nearly immediately after startup. If you're able to load any other saves successfully (particularly saves at least as old/large as the crashing one), I think that's probably not the issue. I would certainly be interested to see what happens if you force DF to use its own ucrtbase.dll, though! Thanks for the information about the Windows updates.

Shirasik

2020-09-10 12:36

reporter   ~0040724

Unfortunately, it's not possible to use local copy of Universal CRT in Windows 10, as stated here:
https://docs.microsoft.com/en-us/cpp/windows/universal-crt-deployment?view=vs-2019

"On Windows 10, the Universal CRT in the system directory is always used, even if an application includes an application-local copy of the Universal CRT. It's true even when the local copy is newer, because the Universal CRT is a core operating system component on Windows 10."

Nor hacking to specify full path, nor making manifest, nor anything.

As for exception, I opened disassembly at the instruction, and, well, the code is in Dwarf Fortress.exe and it doesn't evaluate returned value (rax register), especially as it is a pointer.
But anyway, I googled what upper half of rax is zeroed if any instruction writes to eax, but I'm not sure how this can be the case if Toady don't use assembly inlines. I'm not familiar with disassembly navigation much enough to track all possible call routes and check if this is the case.

TV4Fun

2020-12-04 00:38

reporter   ~0040816

Seeing almost exactly the same thing on Windows 10. More than once I have just been dumped back to the desktop with no error message and nothing in the logs after hours of work, though it's getting more frequent. It seems to come up quite a bit when calling quicksave. Exact exception was "Exception thrown at 0x00007FF6C57A1DFA in Dwarf Fortress.exe: 0xC0000005: Access violation reading location 0x00000000CFF62E60." Full core dump from VC2019 at https://dffd.bay12games.com/file.php?id=15329, stack trace:
     Dwarf Fortress.exe!00007ff6c57a1dfa() Unknown
     Dwarf Fortress.exe!00007ff6c57e4eab() Unknown
     Dwarf Fortress.exe!00007ff6c57e674d() Unknown
     Dwarf Fortress.exe!00007ff6c5a32a0b() Unknown
     Dwarf Fortress.exe!00007ff6c5a56870() Unknown
     Dwarf Fortress.exe!00007ff6c5e06a6c() Unknown
     Dwarf Fortress.exe!00007ff6c5b5dfa5() Unknown
     twbt.plug.dll!00007fff0090c5d9() Unknown
     Dwarf Fortress.exe!00007ff6c5efcc5f() Unknown
     Dwarf Fortress.exe!00007ff6c5ba97e6() Unknown
     Dwarf Fortress.exe!00007ff6c5baa919() Unknown
     SDLreal.dll!00007fff61f6e471() Unknown
     SDLreal.dll!00007fff61f6e855() Unknown
    ucrtbase.dll!00007fff85bf14c2() Unknown
     kernel32.dll!00007fff87af7034() Unknown
     ntdll.dll!00007fff883fd0d1() Unknown


Is there a way to use df-structures or similar to generate a partial PDB file for Dwarf Fortress.exe? It seems like if we can generate address to a lot of the variables and methods in memory, we could help a lot with debugging if we can create debug symbols.

TV4Fun

2020-12-04 00:44

reporter   ~0040817

The particular instruction from the disassembly that this crashed on was `mov eax, dword ptr [rcx]`, the values of the registers captured by VC were:
        EAX C6DDD901
        RBX 000001EC9B95B4C0
        RCX 00000000CFF62E60
        RDI 0000000000000010
        RSP 0000006C178FECA0
RCX contains the address that was trying to be read from, which is zeroed out in the upper half, which does seem to suggest that this has something to do with implicit register zeroing, though I'm not sure how this could have appeared.

TV4Fun

2020-12-04 02:05

reporter   ~0040818

I set a breakpoint on this instruction, and this is definitely part of the save code, as it is called multiple times on every save. This truncation doesn't happen every time though, as I have seen runs where the value in RCX has its upper bytes set.

TV4Fun

2020-12-04 05:37

reporter   ~0040819

I'm going through the disassembly. The presence of TWBT in the stacktrace is making me think it might be a TWBT bug, but I can't be sure.

TV4Fun

2020-12-04 05:53

reporter   ~0040820

There is also quite a lot of pointer arithmetic going on here, so it's possible some value is getting stepped on.

lethosor

2020-12-05 08:25

manager   ~0040821

Shirasik: can you reproduce this with TWBT disabled?

TV4Fun

2020-12-06 01:50

reporter   ~0040822

Last edited: 2020-12-06 06:13

I disabled TWBT and did eventually get a crash while playing, though it took a lot longer than usual. I was able to play for several hours and save many times, while before, it was getting to the point where I couldn't play for more than a few minutes without a crash. I'm wondering if the issue is a memory leak, although Dwarf Fortress was consistently using only about 2GB of RAM, my system has 32GB, and I was nowhere close to being out. The crash here was from a different code location than last time, though I have seen crashes from many locations. This crash looks like a call from a vtable, and the address it is attempting to call is invalid. Of note is that the memory address it was calling was not truncated as before, so that may or may not be the actual issue, though something may be causing memory corruption, as it always seems to happen with trying to dereference pointers recently loaded from memory. Dwarf Fortress does a lot of pointer arithmetic, and we could be seeing the result of a failed heap allocation that wasn't checked for. It also does a lot of looping over arrays that look like they could be susceptible to buffer overruns, so I'm not sure which is happening here. In this case, the particular error was "Exception thrown at 0x00007FF6C60B4DAD in Dwarf Fortress.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF."

The full stack trace was:
> Dwarf Fortress.exe!00007ff6c60b4dad() Unknown
     Dwarf Fortress.exe!00007ff6c60b7d55() Unknown
     Dwarf Fortress.exe!00007ff6c606d5f8() Unknown
     Dwarf Fortress.exe!00007ff6c5b5e254() Unknown
     Dwarf Fortress.exe!00007ff6c5efcc5f() Unknown
     Dwarf Fortress.exe!00007ff6c5ba97e6() Unknown
     Dwarf Fortress.exe!00007ff6c5baa919() Unknown
     SDLreal.dll!00007fff61fbe471() Unknown
     SDLreal.dll!00007fff61fbe855() Unknown
     ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>() Unknown
     kernel32.dll!BaseThreadInitThunk() Unknown
     ntdll.dll!RtlUserThreadStart() Unknown

Values of some local registers:
        R14 00000262F5D979E0
        R15 00000000000001B8
        RAX 504952435345445B
        RCX 00000262F5D979E0

Disassembly of some of the relevant code:
00007FF6C60B4D8E 48 8B F1 mov rsi,rcx
00007FF6C60B4D91 44 8B EA mov r13d,edx
00007FF6C60B4D94 0F 1F 40 00 nop dword ptr [rax]
00007FF6C60B4D98 0F 1F 84 00 00 00 00 00 nop dword ptr [rax+rax]
00007FF6C60B4DA0 48 8B 06 mov rax,qword ptr [rsi]
00007FF6C60B4DA3 4E 8B 34 F8 mov r14,qword ptr [rax+r15*8]
00007FF6C60B4DA7 49 8B 06 mov rax,qword ptr [r14]
00007FF6C60B4DAA 49 8B CE mov rcx,r14
00007FF6C60B4DAD FF 90 E0 06 00 00 call qword ptr [rax+6E0h]

Full core dump at https://dffd.bay12games.com/file.php?id=15331

I'm using Visual Studio together with IDA and Hex-Rays to explore, disassemble, and decompile some of the code around these errors. Are there any good tools to generate debug symbols usable by either of these from what we know of the Dwarf Fortress memory map? I know codegen_c_hdr.pl in df_misc can generate a C header that is somewhat useful for the structures, though getting IDA to actually recognize the global variables and their types is still a slog. Do we have any tools that can improve this?

Shirasik

2021-05-27 05:16

reporter   ~0041064

lethosor: yes, it happens in my case w/o TWBT

For the record, I suspect this happens because of mismatches in how manager and stockpile record code counting items, so when manager's order expects there will be enough available items and for any reason there wasn't - actual workshop order gets zero as a pointer to at least one of chosen items.
Suspition is based on the fact what if I take care to ensure there always will be quite a big room for item blocking (e.g. via hauling jobs using containers) in order conditions, then this sudden crashes happens no more.

lethosor

2021-05-27 13:23

manager   ~0041065

Are you able to prevent a crash by cancelling manager orders before it occurs? I'm not familiar with how the manager works, but the null pointer theory feels unlikely to me (but not impossible).

Shirasik

2021-05-28 07:17

reporter   ~0041067

Last edited: 2021-05-30 10:46

Can't be sure if I able to prevent something that may not happen at all (how to check if anything actually was prevented?), but after asking about *that* random crashing with no visible connection to anything, people in local DF communities said what random silent CTDs happens as frequent as intensively player uses manager to automate production in fortress. Some people advised not to use manager at all, but few people shared the trick what if order conditions requires existence of few times more items than order actually needs, then CTDs may not happen at all. So, this is the clue about some item-related stuff.

PatrikLundell

2021-05-30 01:27

reporter   ~0041069

Please don't cram multiple issues into a single bug report, and definitely don't add unrelated ones into an existing one.
The reason for this is to ensure that when the bug relating to a bug report is fixed, the report can be closed properly and archived (until something indicates there are still issues relating to the bug, at least). Any "extra" stuff will be lost (and investigators won't look at bug reports to find unrelated things even if the report is open).

Cancellation spam caused by concurrent container access is a known longstanding issue, with many people advocating very restricted use of containers. At least in the past there have been attempts to implement bag less seed stockpiles because of this issue, with varying degrees of success. Theoretically, increasing the number of crops used ought to spread the seed bag access over a larger number of bags, reducing the amount of cancellation spam caused by concurrent bag access (that's a work around action, not a fix, of course).

Shirasik

2021-05-30 10:49

reporter   ~0041070

PatrikLundell: edited the note according to your advice.

lethosor

2021-06-02 14:20

manager   ~0041071

I'm hesitant to classify issues based on random guesses of other people, especially if it's unknown whether their issues are related to this one. But if you are able to reproduce this crash (for instance, if you have a save where this crash tends to occur shortly after loading, or you are able to come up with a save that does so), it would be helpful for us to know if measures such as disabling manager orders are able to prevent the crash. If the crash isn't reproducible, it's difficult to know whether any preventative measures are helping.

In this case, I was unable to reproduce the crash on your save. However, if you are able to reproduce the crash consistently by just running the save from 0011619:0040719 for a couple in-game days, it would be very helpful to know whether e.g. removing all manager orders makes it run for significantly longer (multiple times) without crashing.

Shirasik

2021-06-08 18:07

reporter   ~0041080

That's the problem, CTD seems completely random - it may be not days, but couple of weeks of even months, especially if I start to save and load fortress trying to get closer to CTD to provide meaningful save folder. Once I get CTD, I load the last save and 'freely' passing the date at which CTD had just happened a minute ago. Nevertheless, any of this works:
- removing of all orders in manager's interface;
- if orders all are general, then setting workshops to 'general work orders can't task this workshop' in workshop profiles helps as well;
- suspending workshop tasks added by order before they become active (i.e. before being picked by any dwarf) also do.
For each of the above, I played fortress for three years in one run, then decided it's enough, replaced save with one from backup and started from the same point.

However, I can't provide concrete reproduction steps - dwarves cancel jobs for some time (as they shall, if they can't gather items needed), but eventually CTD happens. Save-load without quitting or just quicksaving allows for longer run.

Issue History

Date Modified Username Field Change
2020-09-08 16:49 Shirasik New Issue
2020-09-08 18:05 lethosor Note Added: 0040718
2020-09-08 18:05 lethosor Assigned To => lethosor
2020-09-08 18:05 lethosor Status new => feedback
2020-09-08 18:07 lethosor Note Edited: 0040718
2020-09-08 18:08 lethosor Note Edited: 0040718
2020-09-08 22:42 Shirasik Note Added: 0040719
2020-09-08 22:42 Shirasik Status feedback => assigned
2020-09-09 17:22 lethosor Note Added: 0040720
2020-09-09 17:22 lethosor Summary Random crash due to unhandled exception in third-party library => Random crash due to unhandled exception
2020-09-09 18:15 Shirasik Note Added: 0040721
2020-09-09 20:38 lethosor Note Added: 0040722
2020-09-09 20:38 lethosor Assigned To lethosor =>
2020-09-09 20:38 lethosor Status assigned => new
2020-09-09 20:38 lethosor Note Edited: 0040722
2020-09-10 12:36 Shirasik Note Added: 0040724
2020-12-04 00:38 TV4Fun Note Added: 0040816
2020-12-04 00:44 TV4Fun Note Added: 0040817
2020-12-04 02:05 TV4Fun Note Added: 0040818
2020-12-04 05:37 TV4Fun Note Added: 0040819
2020-12-04 05:53 TV4Fun Note Added: 0040820
2020-12-05 08:25 lethosor Note Added: 0040821
2020-12-05 08:25 lethosor Assigned To => lethosor
2020-12-05 08:25 lethosor Status new => feedback
2020-12-06 01:50 TV4Fun Note Added: 0040822
2020-12-06 06:13 TV4Fun Note Edited: 0040822
2021-05-27 05:16 Shirasik Note Added: 0041064
2021-05-27 05:16 Shirasik Status feedback => assigned
2021-05-27 13:23 lethosor Note Added: 0041065
2021-05-27 13:23 lethosor Status assigned => feedback
2021-05-28 07:17 Shirasik Note Added: 0041067
2021-05-28 07:17 Shirasik Status feedback => assigned
2021-05-28 07:19 Shirasik Note Edited: 0041067
2021-05-30 01:27 PatrikLundell Note Added: 0041069
2021-05-30 10:46 Shirasik Note Edited: 0041067
2021-05-30 10:49 Shirasik Note Added: 0041070
2021-06-02 14:20 lethosor Note Added: 0041071
2021-06-02 14:21 lethosor Status assigned => feedback
2021-06-08 18:07 Shirasik Note Added: 0041080
2021-06-08 18:07 Shirasik Status feedback => assigned