There are too many objects being created during pattern-matching process (you can see it by analyzing the monitor usage and see that code is blocked on the "Reference$Lock" objects, which is a sign of heavily using the CG). It is more efficient to re-use Matcher objects and call "reset" on the them than to create them every time from the Pattern object.
Because re-usable Matcher objects are now stateful (unlike Pattern objects), we create a copy of Matcher object per dispatcher thread.
After having applied the optimization, we have the following results from the same tests:
Publish rate: 34447.12366517396
Average delay: 276 ms
Max delay: 1213 ms
Received: 100000 messages
Publish rate: 26917.900403768504
Average delay: 1370 ms
Max delay: 3419 ms
Received: 100000 messages
Publish rate: 55243.09392265193
Average delay: 2196 ms
Max delay: 4347 ms
Received: 99990 messages
Publish rate: 20951.183741881418
Average delay: 933 ms
Max delay: 4134 ms
Received: 100000 messages
Publish rate: 22311.468094600627
Average delay: 943 ms
Max delay: 4614 ms
Received: 100000 messages
Publish rate: 17201.858544140425
Average delay: 2498 ms
Max delay: 5261 ms
Received: 99960 messages
Publish rate: 15284.40366972477
Average delay: 1157 ms
Max delay: 6499 ms
Received: 99960 messages
Publish rate: 14194.464158977999
Average delay: 1782 ms
Max delay: 6815 ms
Received: 100000 messages
Publish rate: 13719.813391877058
Average delay: 658 ms
Max delay: 7166 ms
Received: 99990 messages
Publish rate: 17689.72227136034
Average delay: 1910 ms
Max delay: 6620 ms
Received: 100000 messages