View previous topic :: View next topic |
Author |
Message |
Fireboar
Joined: 17 Feb 2008 Posts: 323
|
Posted: Sat Jul 10, 2010 13:34 Post subject: Escaped and unescaped characters in .bic files |
|
|
Pretty much exactly what it says in the title. I was originally using the following regular expression to convert a (downcased) character name to its .bic filename: all characters removed except a-z and '.
s/[^a-z']//g
Unfortunately, characters like ö started appearing: clearly the above is not good enough. Does anyone have a comprehensive list of all escaped characters, or all unescaped characters?
Cheers. |
|
Back to top |
|
|
peachykeen
Joined: 13 Feb 2010 Posts: 15 Location: MD, US
|
Posted: Sat Jul 10, 2010 20:23 Post subject: |
|
|
What regex library are you using?
What you're probably running into is locale support, where it considers accented characters as being alphabetical and lets them through. You could probably fix it just by typing each letter out, or (if you can) turn off internationalization or restrict it to a basic character set.
s/[^abcdefghijklmnopqrstuvwxyz']//g could work. It depends on whether the library considers o the same as ö. Check the docs for that. |
|
Back to top |
|
|
Fireboar
Joined: 17 Feb 2008 Posts: 323
|
Posted: Mon Jul 12, 2010 23:43 Post subject: |
|
|
Nononono... the regular expression is fine: it's working as intended. That wasn't the question, I'm after what the expression SHOULD be. ö for example is filtered under the rule, but is nevertheless a valid character as part of the .bic file name.
Example:
I have a file
robért.bic
If I run the method through "Robért", I end up with "robrt.bic". Which is exactly what the regexp is meant to do. However, I would rather not strip the é character at all, since it appears in the .bic file names. So I would add é as another character not to filter besides a-z and '.
But it would be a lot quicker if someone knew exactly which characters are or are not stripped by NWN when saving character files. |
|
Back to top |
|
|
peachykeen
Joined: 13 Feb 2010 Posts: 15 Location: MD, US
|
Posted: Tue Jul 13, 2010 3:47 Post subject: |
|
|
Oh. I misunderstood, answered the reverse of your question. Oops.
I'm not sure as to the full list of escaped characters, but there are two solutions that may help. Either copy-paste the special characters you can find (you may check an ASCII table for reference) into your regex or include the hex codes of those characters (again, check the table). I know most regex libraries will accept both of those, depending on build settings. |
|
Back to top |
|
|
Gryphyn
Joined: 20 Jan 2005 Posts: 431
|
Posted: Tue Jul 13, 2010 9:50 Post subject: |
|
|
It's based on ASCII, so make sure your encoding is UTF-8 (unicode) and it will just work. |
|
Back to top |
|
|
Fireboar
Joined: 17 Feb 2008 Posts: 323
|
Posted: Tue Jul 13, 2010 20:20 Post subject: |
|
|
Gryphyn wrote: | It's based on ASCII, so make sure your encoding is UTF-8 (unicode) and it will just work. |
... what, exactly, will just work? My regex only keeps 27 characters no matter what the input is (see post above) as it is designed to. There's no "just working" there if I want to keep more characters.
peachykeen wrote: | Oh. I misunderstood, answered the reverse of your question. Oops.
I'm not sure as to the full list of escaped characters, but there are two solutions that may help. Either copy-paste the special characters you can find (you may check an ASCII table for reference) into your regex or include the hex codes of those characters (again, check the table). I know most regex libraries will accept both of those, depending on build settings. |
Yes, I suspected I might have to do this, though I was kinda hoping not to have to. *sigh* Thanks anyway. |
|
Back to top |
|
|
Gryphyn
Joined: 20 Jan 2005 Posts: 431
|
|
Back to top |
|
|
|