Escaped and unescaped characters in .bic files

Fireboar · Joined: 17 Feb 2008 Posts: 323

Pretty much exactly what it says in the title. I was originally using the following regular expression to convert a (downcased) character name to its .bic filename: all characters removed except a-z and '.

s/[^a-z']//g

Unfortunately, characters like ö started appearing: clearly the above is not good enough. Does anyone have a comprehensive list of all escaped characters, or all unescaped characters?

Cheers.

peachykeen · Joined: 13 Feb 2010 Posts: 15 Location: MD, US

What regex library are you using?

What you're probably running into is locale support, where it considers accented characters as being alphabetical and lets them through. You could probably fix it just by typing each letter out, or (if you can) turn off internationalization or restrict it to a basic character set.

s/[^abcdefghijklmnopqrstuvwxyz']//g could work. It depends on whether the library considers o the same as ö. Check the docs for that.

Fireboar · Joined: 17 Feb 2008 Posts: 323

Nononono... the regular expression is fine: it's working as intended. That wasn't the question, I'm after what the expression SHOULD be. ö for example is filtered under the rule, but is nevertheless a valid character as part of the .bic file name.

Example:

I have a file
robért.bic

If I run the method through "Robért", I end up with "robrt.bic". Which is exactly what the regexp is meant to do. However, I would rather not strip the é character at all, since it appears in the .bic file names. So I would add é as another character not to filter besides a-z and '.

But it would be a lot quicker if someone knew exactly which characters are or are not stripped by NWN when saving character files.

peachykeen · Joined: 13 Feb 2010 Posts: 15 Location: MD, US

Oh. I misunderstood, answered the reverse of your question. Oops. Razz

I'm not sure as to the full list of escaped characters, but there are two solutions that may help. Either copy-paste the special characters you can find (you may check an ASCII table for reference) into your regex or include the hex codes of those characters (again, check the table). I know most regex libraries will accept both of those, depending on build settings.

Gryphyn · Joined: 20 Jan 2005 Posts: 431

It's based on ASCII, so make sure your encoding is UTF-8 (unicode) and it will just work.

Fireboar · Joined: 17 Feb 2008 Posts: 323

Gryphyn · Joined: 20 Jan 2005 Posts: 431

^(.*/)?(?:$|(.+?)(?:(\.[^.]*$)|$))

Regex that matches path, filename and extension

not a bad article, it'll point you to trimming this to just the bits you want.

Windows: File naming errors

The NWNX Community Forum