Sunday, August 12, 2007

What is this all about

Last Thursday I spent few hours writing sample messaging middleware in Java (for a presentation on high-rate messaging in Java). I called it "DIY messaging server", "DIY messaging client" and "DIY messaging test". Features were:
  1. All messages are Java Strings
  2. A client can do 3 things to the server:
    1. "REC" (followed by a regular expression) - express interest in receiving all messages containing given regular expression
    2. "STOP" stop receiving any messages
    3. "M" (followed by a string) - send message for delivery
  3. Only one "subscription" per connected client is possible - any "REC" command cancels the effect of previous one.
  4. There is no acknowledgment and no flow control - i.e. senders are not getting any feedback from the server
Implementation used old-fashion blocking I/O mechanisms with 2 threads for every connected client (one thread is reading commands and accepting sent messages, another one writes to subscribers to deliver messages to them).

Here is the architecture. Clients can be either senders, or receiver or both at the same time. Every client has an open socket to the server (two arrows between client and server). Within the server, every client connection is serviced by two threads (shown as stars). When client is sending messages, corresponding thread reads them off the input stream of the socket and places into the distribution queue. Distribution queue is accessible to a number of distributor threads (4 stars in the middle of the server). Using their routing rules, which are updated by "REC" and "STOP" commands, these distributor threads are placing every message into zero, one or many output queues (there is one output queue per connected client that is currently interested in messages). Output queues are dispatched by the threads attached to the output streams of the sockets.

My next steps:
  1. Upload the current code somewhere to sourceforge and get Subversion access to it, so I can specify revision number for every blog entry
  2. Simple monitoring of the distribution queue and output queues. Monitoring will include size of the queues and also contribution of every sender
  3. Flow control post + simple flow control implementation. Judgment for flow control will be based on monitoring values and some parameters known as "watermark values"
  4. Post about unfairness of flow control
  5. Re-writing the code to use non-blocking I/O. This is first of all to compare performance and scalability (with flow control on)
  6. Post about TTL (Time-To-Live) as an alternative to Flow Control, which is more fair, but could lead to data loss. Concerns about memory usage because of keeping TTL.
  7. Flush messages to disk to avoid excessive memory consumption when using long TTL. Rotate the files in order to avoid indexing and use of embedded databases etc.
  8. Some other cool stuff to come...

No comments: