logo logo

 Back to main page

The NWNX Community Forum

 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Escaped and unescaped characters in .bic files

 
Post new topic   Reply to topic    nwnx.org Forum Index -> General Discussion
View previous topic :: View next topic  
Author Message
Fireboar



Joined: 17 Feb 2008
Posts: 323

PostPosted: Sat Jul 10, 2010 13:34    Post subject: Escaped and unescaped characters in .bic files Reply with quote

Pretty much exactly what it says in the title. I was originally using the following regular expression to convert a (downcased) character name to its .bic filename: all characters removed except a-z and '.

s/[^a-z']//g

Unfortunately, characters like ö started appearing: clearly the above is not good enough. Does anyone have a comprehensive list of all escaped characters, or all unescaped characters?

Cheers.
Back to top
View user's profile Send private message
peachykeen



Joined: 13 Feb 2010
Posts: 15
Location: MD, US

PostPosted: Sat Jul 10, 2010 20:23    Post subject: Reply with quote

What regex library are you using?

What you're probably running into is locale support, where it considers accented characters as being alphabetical and lets them through. You could probably fix it just by typing each letter out, or (if you can) turn off internationalization or restrict it to a basic character set.

s/[^abcdefghijklmnopqrstuvwxyz']//g could work. It depends on whether the library considers o the same as ö. Check the docs for that.
Back to top
View user's profile Send private message AIM Address MSN Messenger
Fireboar



Joined: 17 Feb 2008
Posts: 323

PostPosted: Mon Jul 12, 2010 23:43    Post subject: Reply with quote

Nononono... the regular expression is fine: it's working as intended. That wasn't the question, I'm after what the expression SHOULD be. ö for example is filtered under the rule, but is nevertheless a valid character as part of the .bic file name.

Example:

I have a file
robért.bic

If I run the method through "Robért", I end up with "robrt.bic". Which is exactly what the regexp is meant to do. However, I would rather not strip the é character at all, since it appears in the .bic file names. So I would add é as another character not to filter besides a-z and '.

But it would be a lot quicker if someone knew exactly which characters are or are not stripped by NWN when saving character files.
Back to top
View user's profile Send private message
peachykeen



Joined: 13 Feb 2010
Posts: 15
Location: MD, US

PostPosted: Tue Jul 13, 2010 3:47    Post subject: Reply with quote

Oh. I misunderstood, answered the reverse of your question. Oops. Razz

I'm not sure as to the full list of escaped characters, but there are two solutions that may help. Either copy-paste the special characters you can find (you may check an ASCII table for reference) into your regex or include the hex codes of those characters (again, check the table). I know most regex libraries will accept both of those, depending on build settings.
Back to top
View user's profile Send private message AIM Address MSN Messenger
Gryphyn



Joined: 20 Jan 2005
Posts: 431

PostPosted: Tue Jul 13, 2010 9:50    Post subject: Reply with quote

It's based on ASCII, so make sure your encoding is UTF-8 (unicode) and it will just work.
Back to top
View user's profile Send private message
Fireboar



Joined: 17 Feb 2008
Posts: 323

PostPosted: Tue Jul 13, 2010 20:20    Post subject: Reply with quote

Gryphyn wrote:
It's based on ASCII, so make sure your encoding is UTF-8 (unicode) and it will just work.


... what, exactly, will just work? My regex only keeps 27 characters no matter what the input is (see post above) as it is designed to. There's no "just working" there if I want to keep more characters.

peachykeen wrote:
Oh. I misunderstood, answered the reverse of your question. Oops. Razz

I'm not sure as to the full list of escaped characters, but there are two solutions that may help. Either copy-paste the special characters you can find (you may check an ASCII table for reference) into your regex or include the hex codes of those characters (again, check the table). I know most regex libraries will accept both of those, depending on build settings.


Yes, I suspected I might have to do this, though I was kinda hoping not to have to. *sigh* Thanks anyway.
Back to top
View user's profile Send private message
Gryphyn



Joined: 20 Jan 2005
Posts: 431

PostPosted: Wed Jul 14, 2010 9:39    Post subject: Reply with quote

^(.*/)?(?:$|(.+?)(?:(\.[^.]*$)|$))

Regex that matches path, filename and extension

not a bad article, it'll point you to trimming this to just the bits you want.

Windows: File naming errors
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    nwnx.org Forum Index -> General Discussion All times are GMT + 2 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group