NWNX Based Voice Communication MOD

vvortex3 · Joined: 20 Sep 2005 Posts: 4 Location: Sacramento, CA

I am trying to gather support for creating the following NWN (possible NWN2) mod (see next post). This mod will allow voice communication between users on a given server (well, potentially cross server). It will also allow streaming of audio files to users. Ideally, the audio files will be volume scaled based on x,y,z distance, and the audio will be positional. Completion of the mod would allow the following new features to be added to NWN:

Server controlled dynamic audio for sound effects, music, etc.
NPCs can talk with voice
Players can talk with voice based on scriptable criteria
Admins can snoop voice
Users can record voice to the server for later play back

I have already written a working version of this MOD using DirectX's DirectPlay Voice. I dont really have the coding expertise to continue any farther however (directplay voice is pretty abstracted), and would like to use something other than directx for the final product. I could write this entire mod myself If I had a relatively easy to use voice compression SDK with a client/server example, and an example of inserting an audio file into one of the streams (Hence my rentacoder project found here: http://www.rentacoder.com/RentACoder/misc/BidRequests/ShowBidRequest.asp?lngBidRequestId=330455
).

Any comments, suggestions, etc. Would be appreciated. Ultimately, this MOD relies on NWNX (specifically mysql support). The server uses polling to retrieve player locations, and voice stream commands from a database. There is, however, a latency involced (heartbeat time for nwn server to insert data into table plus heartbeat time of server to poll table). This latency may be removable some day with a true server-side mod.

vvortex3 · Joined: 20 Sep 2005 Posts: 4 Location: Sacramento, CA

Voice Communication Client Requirements
Connects to an IP address specified in client.conf, if fail, silently log error
Sends a user name specified in client.conf
reads microphone sensitivity value from client.conf (1-10)
Has no window, runs transparently
Outputs error messages to errorlog.txt
Plays positional audio based on x,y,z coordinates
can play multiple concurrent audio streams (a dynamic number)
does not lock audio hardware exclusively

Voice Communication Server Requirements
Accepts voip connections from client
controls which audio codec is used by the clients
Associates username sent from client with its respective connection
Writes connected usernames out to ODBC as seen below
table: connected_users
id username connecttime
1 dewayne timestamp

removes row from table when the client disconnects or goes
link dead.
Polls room,x,y,z coordinates for each connected username from an arbitray MySQL table every X seconds (using ODBC, table name specified in server.conf)

table: user_locations
id username x y z room
1 dewayne 10.523234 6.123456 0.233245 HOUSE1

Can insert an mp3 file to play arbitrarily to an individual client at an arbitrary x,y,z position
Controls which clients can hear which other clients' streams (changeable on the fly) using
the following criteria:
are the two clients in the same room, AND within X pixels (using x,y,z)
(value of x pixels is specified in server.conf)

polls a table called "stream_control" that looks like this:

id username target audiofile timestamp
----------------------------------------------------------------------------------------------------------------
1 dewayne bob NULL ????
2 dewayne steve NULL ????
3 SERVER bob hello.mp3 ????
4 bob SERVERRECSTART bob<timestamp>.mp3 ????
4 bob SERVERRECSTOP bob<timestamp>.mp3 ????
5 bob SILENCE NULL ????

**UPDATE**
stream_control will also contain x,y,z values of sender

The server will modify who can talk to whom using the first set of rules, THEN it
will override those values using the new values from the "stream_control" table.
if the username is SERVER then the server will send the client an mp3 over
an audio stream (of standard voice quality). This mp3 or audio file should take
no more bandwitdh than a one-way voice communication.If the target is SERVERRECSTART, the server will start recording from the client until it receives SERVERRECSTOP, or the server's maxrecord limit is reached (defined in server.conf). If the target is SILENCE, squelch the user until UNSILENCE is received. It deletes entries from this table after processing them.

Server Flow
Start Server (runs as windows service)
Read Settings From server.conf
Get ODBC Name, username,password
Get table Name
Get directory for insertable audio clips
Get Audio Codec
Get value for x,y,z distance for users to be able to hear eachother
Get frequency in milliseconds to poll tables
Get maximum record time (before it stops recording automatically)
Connect to ODBC Database
Start Listening For Voice Connections, but also poll tables every X milliseconds (if at least 1 client is in "ready" state)
THE SERVER HAS AN INTERNAL STRUCTURE TO ROUTE
VOICE TRAFFIC (x,y,z is the coordinates of the sender)
struct voice_traffic_routing {
internalconnectionid username tofile target x,y,z fromfile 12423 dewayne Y hello.mp3 null
12424 bob N dewayne null
12425 SERVER N bob bye.mp3
}
THE SERVER ALSO HAS AN INTERNAL STRUCTURE FOR OUTGOING
FILES THAT ARE STILL RECORDING
struct sound_recording {
internalconnectionid username filename starttime data
12423 dewayne hello.mp3 <timestamp> binary
}

OnConnectAccept() {
Request Name From Client, tell client which voice codec
Create new client object
Set Client object connection state = 1 (waiting for name)
}

OnReceiveName() {
Associate name with correct client object
Set client object connection state = 2 (has name, ready to talk)
Write to connected_users table (connection name, timestamp)
Increment Total number of "ready" clients
}

OnPollTables() {
Get info from user_locations and set the x,y,z,room values for each user object that is found in that table.

Get info from stream_control (username,target,audio_file) order by timestamp in descending order. Read latest timestamp into global variable
delete from stream_control where timestamp is less than or equal to
latest timestamp read.

if two users are within X distance from eachother, and in the same room
then mark them to hear eachother's voice if two users have moved away from X distance from eachother mark them to not hear eachothers voice

Mark any users to hear eachother found in stream_control
Mark any users to be silenced as found in stream_control
Mark any users to be unsilenced as found in stream_control
Mark any users to begin recording as found in stream_control and set
the filename in that users's object.
Mark any users to stop recording as found in stream_control
**write this users recorded voice to file**, delete
sound_recording entry
*nonblocking function*
Mark any users to hear a sound clip as found in stream_control
if username=SERVER and fromfile is a valid .mp3 file
**immediately send sound cilp & RELATIVE x,y,z to that user**
*nonblocking function*

}

OnReceiveVoicePacket() {
(I am assuming that the voice will operate as a stream of sorts)

read voice_traffic_routing struct

for each entry {
is this entry supposed to go tofile?
Yes {
Append voice data to internal sound_recording struct
check timestamp in sound_recording, see if it has
been recording longer than max value in server.conf
if so, stop recording, delete sound_recording entry
and write to file.
}
No {
route voice traffic to target, send SENDER's RELATIVE x,y,z as part of voice data packet (note: there can be multiple targets), ignore targets that are link dead (take no action yet). Relative X,Y,Z is calculated as follows:
X = RECIPIENTX - SENDERX
Y = RECIPIENTY- SENDERY
Z = RECIPIENTZ - SENDERZ
}
}
}

OnDisconnect_Or_LinkDead() {
delete entry from connected_users
decrement total number of "ready" users
}

Client Flow
Client Is Executed (does not run window)
Reads client.conf
Get "this" Client's username
Get IP Address of server
Get Microphone sensitivity level
Read master process name from file ex. "program.exe"
Connects to Server IP
Waits to Receive Audio Codec Info
Sends username
Begins sending any sound from mic to server
Monitor System Process list, if process from client.conf is gone, end program.
OnReceiveVoicePackets() {
Determine audio position from relative X,Y,Z
volume of audio is scaled by distance
Play audio packet from that position
Allows a dynamic number of cuncurrently playing streams
}
Any errors log to errorlog.txt

Client Basic Idea:
Client runs program, and begins talking as soon as the server allows
uses system default microphone
Client hears dynamicly loaded sound streamed from server, volume adjusted by distance
and positional (3d sound).
Client exits when master process exits

Server Basic Idea:
Server starts as windows service, waits for clients.
Sends codec information to clients
keeps track of who is logged in by writing to connected_users table
Reads positional data from user_locations
Overrides positional data using stream_control, sometimes writing certain audio to file
sometimes reading certain audio FROM file
Routes audio information to whomever is supposed to receive it (1 or more recipients)

vvortex3 · Joined: 20 Sep 2005 Posts: 4 Location: Sacramento, CA

I forgot to note, players "USERNAME" is defined as their public NWN key.

(There has to be a hard link between a client's machine, their character on the server, and their connection to the voice server. The public NWN Key was able to match all of these criteria. Public NWN key is defined as every other character from nwncdkey.ini for the first 8 characters starting at character #2).

vvortex3 · Joined: 20 Sep 2005 Posts: 4 Location: Sacramento, CA

Btw. My motivation for this project is just to gain the aforementioned functionality for entertainment purposes. I dont care if you steal my idea outright, as long as you give me a copy of the source. I am even willing to pay for development help.

Irritatus · Joined: 05 Sep 2005 Posts: 7

I was looking at something similiar using voice + the speech module, + a script pack I saw a while back that extends the speech program to contain meta-channels, and supports the DMFI wand languages.

I thought it might be interesting to see if I could do voice capture and send the "text" to the speech system on the server for the person. using whatever settings they had for toggle of lanuages etc. ( I really wish I remembered the script pack name at the moment) so, I could say "hello" and the talus system would print hello, and if I had toggled on dwarven, it would print, whatever the dmfi langauges translated the text.

Could you not do the reverse for your speech system, instead of trying to inject voice into the nwn framework, just export player location information (and status, like dead, silenced, etc.) to your speech server? So, you could update each players current area on transitions, that would be a top level filter, you dont' need to request information about players not in the same area as the talker, or do anything if the talker is dead, sleeping, or silenced...etc.

then, you inject a command to the nwnserver like "player a talked", and calculate distances to those players. which you export back out to the voice system with some information like "I am 10 meters from speaker" etc. and the voice system scales the volume of the voice "packet" sent to those individuals( could set to 0 if target is deafened, dead, etc. as well).

of course, one thing you have to do is pass a token with each of these steps identifying the particular "voice request" or somehting to keep track of which voice message all this stuff needs to relate to.

Another option might just be to have player location data updated at a reasonable interval, and at transitions. and export this directory to a hash table or something in your voice "server" and then most of your processing is done externally at the expense of a "heatbeat" like action. You might also get a bit of flakyness if the update interval is not sufficient. like 30 seconds ago I was alive, but now I am dead, and we didnt' update the table yet.

well this is all very loose concepts, but I think something like xmlrpc coudl be very handy if you look at the voiceXML grammar.

Irritatus.
_________________
There are 10 kinds of people, those that understand binary, and those who don't.

FunkySwerve · Joined: 02 Jun 2005 Posts: 377

The system you are referring to is SIMTools.
Funky

The NWNX Community Forum