Writing Ejabberd Modules: Presence Storms

August 28, 2008

One of the biggest benefits to using ejabberd is the ease with which it is possible to extend the server's functionality. After only a week of production operation, I've already written and deployed four new modules at Chesspark. Let's get our hands dirty, fire up an editor, and learn how to write a new ejabberd module. Unlike most tutorials, the module created here is a real piece of code that we have running in production at Chesspark.

Getting Started

The code presented here is in Erlang, which is what ejabberd is written in. If you are unfamiliar with Erlang, you will probably still be able to follow along just fine. If you have a question about Erlang syntax or an Erlang function, the Erlang Web site has tutorials and reference documentation for your enlightenment.

Anders Conbere has written a series of excellent tutorials on writing ejabberd modules at his blog: building ejabberd, writing a generic module, writing an HTTP module, and writing an XMPP bot module. I will assume that that you have read at least his tutorial on generic modules.

Finally, there is some incomplete documentation about developing ejabberd modules in the ejabberd wiki.

Stopping Storms With A Little mod_sunshine

During our ejabberd migration, we ran into a bug in our desktop client that caused it to send the same presence stanza over and over in an infinite loop. While providing an updated build of the client solved the problem for many of our users, some users refused to upgrade. Not only did this cause a lot of undue stress on our XMPP server, it also affected all of the broken client's user's online contacts. These presence storms were silently degrading the service quality for nearly everyone.

We decided that the best solution would be to detect presence storms and disconnect the offending clients. This required writing a fairly simple ejabberd module that I named mod_sunshine. We'll recreate the same code here.

mod_sunshine's Humble Beginnings

Every ejabberd module implements the gen_mod behavior. An Erlang behavior is just a set of functions that a module implementing the behavior is required to support. The gen_mod behavior requires two functions: start/2 and stop/1. These functions are called when ejabberd starts and stops the module.

The skeleton of an ejabberd module appears below.

-module(mod_sunshine).

-behavior(gen_mod).

-export([start/2, stop/1]).

start(_Host, _Opts) ->
    ok.

stop(_Host) ->
    ok.

For those of you wondering, the preceding underscores on the function arguments signal to the Erlang compiler that these arguments are unused.

This code should be placed in a file called mod_sunshine.erl. Assuming you have Erlang and ejabberd somewhere on your system, you can compile mod_sunshine.erl with this command: erlc -I /path/to/src_or_include -pz /path/to/src_or_ebin mod_sunshine.erl.

This module can be added to the server by adding a line like the one below to the server configuration. You'll also need to place the module's .beam file in the ejabberd ebin directory alongside the rest of ejabberd's .beam files.

{mod_sunshine, []}

Hooks, Hooks, and More Hooks

ejabberd modules extend functionality of the server by connecting themselves to hooks. A hook is just a place in the code where ejabberd offers a connection point. There are hooks for when a user sends a packet or receives a packet, for when sessions are created or removed, and for many other things. A list of hooks can be found in the ejabberd events and hooks documentation.

Each hook in ejabberd is associated with a chain of functions to execute. ejabberd modules will add functions to these chains, and when the hook is triggered, the chain of functions is executed. Each function has a priority in the chain, and ejabberd will execute them in order. Additionally, it is possible for a function to terminate the hook processing so that later functions in the chain do not get executed.

ejabberd modules typically add some functions to one or more of these hook chains and then wait around for them to be executed. This is very similar to GUI code where one responds to input events, only instead of mouse clicks and keyboard presses, modules will respond to packets being sent or users coming online.

Just to make sure you fully understand the hook processing and the function chains, let's go through an example.

Passing Around Offline Messages

When a message comes into an ejabberd server for a user that is currently offline, ejabberd executes the chain of functions for the hook offline_message_hook.

The ejabberd session manager adds a default function to the offline_message_hook chain that bounces the incoming message with a service unavailable error. This hook is added at the lowest priority so that it executes after anything else in the chain.

The mod_offline module, which comes with ejabberd, uses this same hook to add support for storing offline messages in a database. These messages are then sent to a user when they next come online. It does this by adding a function to the chain at a higher priority than the session manager's. When a message gets sent to an offline user, this function executes first and stores the message in the database. Since it would be silly to return an error now that the message has been stored, mod_offline's function signals to the hook processor that no more functions should be run in this chain. The session manager's default function will never be executed.

At Chesspark we generate a lot of messages that are meaningless if they are not delivered to someone while they are online. We built a module that filters offline messages and discards those that are inappropriate for database storage. This works by adding a function to the offline_message_hook chain at a very high priority. This means that our function is the first function to receive the message sent to an offline user. If the module determines the message is inappropriate for storage, we signal to the hook processor that it should not continue executing chain functions, silently dropping the message. Otherwise, the module let's the rest of the chain execute normally, which means that mod_offline, which is next in the chain, will receive the message and store it.

Hooking Presence For mod_sunshine

Now that we know a lot about hooks and hook processing, it's time to put it to some use in mod_sunshine. First, we have to know which hook to use.

Unfortunately, none of the documentation is very clear about which hooks are executed at what times. I have found that experimentation is the only way to figure this out. Use the name of the hooks to make a guess, and then use some logging statements to figure out if that hook gets run when you expect. You can also search for the hook's name in the ejabberd source code to find out when it is executed and what is expected of the functions in the chain.

For mod_sunshine, the hook we are interested in is set_presence_hook. This hook is processed whenever a connected user sends a presence stanza to the server. It's common to add a function to the chain in start/2 and remove the same function in stop/1, and that's what we've done below. Additionally, we must export the function we wish to add.

-module(mod_sunshine).

-behavior(gen_mod).

-export([start/2, stop/1, on_presence/4]).

start(Host, _Opts) ->
    ejabberd_hooks:add(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

stop(Host) ->
    ejabberd_hooks:delete(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

on_presence(_User, _Server, _Resource, _Packet) ->
    none.

A function in the set_presence_hook chain takes four parameters: the user, server, and resource of the person that sent the presence stanza as well as the actual presence stanza. It is required to return none.

A Module That Does Nothing Is Hardly A Module At All

So far we have a module that doesn't almost nothing. It adds a function to the set_presence_hook chain when it is loaded, removes the function from the chain when it is unloaded, and does absolutely nothing when the chain is executed. It's time to get to the meat of the module.

First, mod_sunshine needs to know what a presence storm is. We can define this as a user sending the same presence stanza count times in interval seconds. Since we might not know what the optimal values are for count and interval, it is best to leave those as options to be defined in the ejabberd configuration file.

Next, the module needs to store its state somewhere so it can check if a user is sending a presence storm. Since ejabberd passes no state to on_presence/4, we must keep track of this ourselves. We'll do this with mnesia, one of Erlang's built in databases.

Finally, the module needs to disconnect any user who causes a presence storm. Unfortunately, ejabberd does not seem to have an API or message for this out of the box, so we will have to add this ourselves to the internal ejabberd c2s module.

Intermission And Logging

It's time for a small break before we dive into the real code, so let's digress for a second and talk about logging.

It will probably happen in your Erlang programming career that something goes wrong and you can't figure out what that something is. A time honored tradition exists for exactly these situations - printing some information to a log file.

Should you need to do this, ejabberd has several macros for writing to its log file at different logging levels. They are defined in ejabberd.hrl which you will have to include in the code to use them. I usually prefer to use ?INFO_MSG. These functions take a string followed by a list of arguments. Be sure to remember that you must pass an empty list if there are no arguments.

Here's mod_sunshine.erl from before with modifications to announce its startup and shutdown.

-module(mod_sunshine).

-behavior(gen_mod).

-include("ejabberd.hrl").

-export([start/2, stop/1, on_presence/4]).

start(Host, _Opts) ->
    ?INFO_MSG("mod_sunshine starting", []),
    ejabberd_hooks:add(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

stop(Host) ->
    ?INFO_MSG("mod_sunshine stopping", []),
    ejabberd_hooks:delete(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

on_presence(_User, _Server, _Resource, _Packet) ->
    none.

Ok, break time is over; let's get back to the real stuff.

Dealing With Options

To pass options to mod_sunshine we have to change the ejabberd configuration file. The line we used above should be modified to the one below.

{mod_sunshine, [{count, 10}, {interval, 60}]}

This will tell mod_sunshine that a presence storm is anyone sending 10 copies of the same presence stanza within 60 seconds. Now we just need to get these options into the code.

Alert readers will notice that while start/2 is passed the options in the Opts variable, the on_presence/4 function does not receive these. How are we to get the options?

Thankfully, gen_mod has an API function to fetch the module options from the configuration - gen_mod:get_module_opt(Host, Module, Opt, Default). This function needs the virtual host the options are defined for. This just happens to be the contents of the Server variable in on_presence/4. gen_mod:get_module_opt/4 also lets you define the default option values.

Adding options to mod_sunshine is now easy.

-module(mod_sunshine).

-behavior(gen_mod).

-include("ejabberd.hrl").

-export([start/2, stop/1, on_presence/4]).

start(Host, _Opts) ->
    ?INFO_MSG("mod_sunshine starting", []),
    ejabberd_hooks:add(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

stop(Host) ->
    ?INFO_MSG("mod_sunshine stopping", []),
    ejabberd_hooks:delete(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

on_presence(_User, Server, _Resource, _Packet) ->
    %% get options
    StormCount = gen_mod:get_module_opt(Server, ?MODULE, count, 10),
    TimeInterval = gen_mod:get_module_opt(Server, ?MODULE, interval, 60),
    none.

Persisting State With Mnesia

Mnesia is easiest to use when paired with Erlang records. For the Erlang beginners, Erlang records are like C structs. They are normal Erlang tuples with named fields. We'll need to create a record for mod_sunshine to store the user, packet, the time the packet was originally sent, and the number of times the packet has been sent. Here's what that looks like.

-record(sunshine, {usr, packet, start, count}).

The name of the record is sunshine. Note that the first of the four fields is not a typo of 'user'. It is an acronym for 'user, server, resource' and will store the full JID of the user sending presence.

Creating Mnesia Tables

Now that we have a record, we must create an Mnesia table to use them. Each row of the Mnesia table will be in this format, and the key is the first item of the record, usr. mnesia:create_table/2 does the job for us. Mnesia can create various kinds of tables - disk tables, memory tables, and tables that allow duplicates. By default it creates an in-memory, unsorted table that does not allow duplicate keys, and this happens to be exactly what we want. The Mnesia man page has information on other table types if you are interested.

We'll create and clear this table in start/2.

start(Host, _Opts) ->
    ?INFO_MSG("mod_sunshine starting", []),
    mnesia:create_table(sunshine, 
            [{attributes, record_info(fields, sunshine)}]),
    mnesia:clear_table(sunshine),
    ejabberd_hooks:add(set_presence_hook, Host, ?MODULE, on_presence, 50),
    ok.

The first argument to mnesia:create_table/2 is the name of the table, which must be the same as the record name. This is followed by a list of options, of which we only use attributes. record_info(fields, sunshine) is a built in function of Erlang that returns a list of the fields in the record sunshine. Mnesia won't destroy an existing table if you call create_table/2, so to make sure there isn't any old data in there we use mnesia:clear_table/1.

Reading And Writing To The Database

Mnesia is a transactional database. To read and write to it, one normally uses mnesia:transaction/1 which takes a function to execute inside the transaction. This can be slow, so often mnesia:dirty_read/2 is used to skip transactions for reads, and we will use that here.

In order to keep the user, server, and resource key consistent, we must stringprep the username and server. Among other things, this ensures that mixed case usernames get lower cased. This is easy in ejabberd since it provides a library of functions for JID manipulation called jlib. To use it, we include jlib.hrl in our module.

Once we have a consistent key, we look it up in the sunshine table. There are three possibilities: no record is found, a record is found for the current packet, or a record is found for a different packet.

The skeleton for this is below.

on_presence(User, Server, Resource, Packet) ->
    %% get options
    StormCount = gen_mod:get_module_opt(Server, ?MODULE, count, 10),
    TimeInterval = gen_mod:get_module_opt(Server, ?MODULE, interval, 60),

    LUser = jlib:nodeprep(User),
    LServer = jlib:nodeprep(Server),

    case catch mnesia:dirty_read(sunshine, {LUser, LServer, Resource}) of
        [] ->
            %% no record for this key
            ok;
        [#sunshine{usr={LUser, LServer, Resource}, 
                   packet=Packet, start=_TimeStart, count=_Count}] ->
            %% record for this key and packet exists
            ok;
        [#sunshine{usr={LUser, LServer, Resource},
                   packet=_OtherPacket, count=_OtherCount}] ->
            %% a record for this key was found, but for another packet
            ok
    end,
    none.

Those new to Erlang are probably a little weirded out right now. Erlang uses pattern matching extensively. First we read from the database, and attempt to match the pattern of the result against each section of the case statement. If a variable in the pattern already contains a value, it must match the value in the result. If a variable in the pattern does not have a value, it gets the value of the result in that spot.

If we get an empty list, there is no record matching the key. Note that we get a list as a result because some Mnesia table types support duplicate keys. In our table, the result will always be an empty list or a list with one item.

The next two patterns match a row. #sunshine{...} is a record reference for the sunshine record. In the first of the two patterns, all the variables have values except for _TimeStart and _Count. This means the result must be a record that matches the record for this user and this packet. The second pattern matches a record for this user, but with any packet, as _OtherPacket is without a value.

Pattern matching is nice and powerful. Not only did we single out exactly the results we needed without any if statements, we also already put the interesting fields into their own variables!

Now we just need to code the actions for each of these three cases.

In the first case, we create a new entry for that user and packet.

In the second case, we need to determine we're within TimeInterval seconds of StartTime. If not, we need to reset the count and start time for this user and packet. Otherwise, if the count is above StormCount, the user is sending a presence storm and needs to be disconnected, and if the count is below StormCount we just increment the count.

In the final case, we've received a different presence packet, so we need to overwrite the user's record in Mnesia with a new one for this packet.

Remember that Mnesia writes happen inside transactions as you look at the code below. The writes are put into anonymous functions which are then passed to mnesia:transaction/1 for execution inside a transaction.

on_presence(User, Server, Resource, Packet) ->
    %% get options
    StormCount = gen_mod:get_module_opt(Server, ?MODULE, count, 10),
    TimeInterval = gen_mod:get_module_opt(Server, ?MODULE, interval, 60),

    LUser = jlib:nodeprep(User),
    LServer = jlib:nodeprep(Server),

    {MegaSecs, Secs, _MicroSecs} = now(),
    TimeStamp = MegaSecs * 1000000 + Secs,

    case catch mnesia:dirty_read(sunshine, {LUser, LServer, Resource}) of
        [] ->
            %% no record for this key, so make a new one
            F = fun() ->
                mnesia:write(#sunshine{usr={LUser, LServer, Resource},
                                       packet=Packet,
                                       start=TimeStamp,
                                       count=1})
            end,
            mnesia:transaction(F);
        [#sunshine{usr={LUser, LServer, Resource}, 
                   packet=Packet, start=TimeStart, count=Count}] ->
            %% record for this key and packet exists, check if we're
            %% within TimeInterval seconds, and whether the StormCount is
            %% high enough.  or else just increment the count.
            if
                TimeStamp - TimeStart > TimeInterval ->
                    F = fun() ->
                        mnesia:write(#sunshine{usr={LUser,
                                                    LServer,
                                                    Resource},
                                               packet=Packet,
                                               start=TimeStamp,
                                               count=1})
                    end,
                    mnesia:transaction(F);
                Count =:= StormCount ->
                    %% TODO: disconnect user

                    F = fun() ->
                        mnesia:delete({sunshine, {LUser, LServer, 
                                                  Resource}})
                    end,
                    mnesia:transaction(F);
                true ->
                    F = fun() ->
                    mnesia:write(#sunshine{usr={LUser, LServer,
                                                   Resource},
                                              packet=Packet,
                                              start=TimeStamp,
                                              count=Count + 1})
                    end,
                    mnesia:transaction(F)
            end;
        [#sunshine{usr={LUser, LServer, Resource},
                   packet=_OtherPacket, count=_OtherCount}] ->
            %% a record for this key was found, but for another packet,
            %% so we replace it with a new record.
            F = fun() ->
                mnesia:write(#sunshine{usr={LUser, LServer, Resource},
                                       packet=Packet,
                                       start=TimeStamp,
                                       count=1})
            end,
            mnesia:transaction(F)
    end,
    none.

Disconnecting Users

The only thing left to do is disconnect the user when a presence storm is detected. I wish this was as easy as ejabberd_sm:disconnect(User, Server, Resource), but it seems that the ejabberd developers have not yet added something along these lines. To solve this, we will use Erlang's message passing to notify the user's c2s process that it should disconnect the user.

After some exploring, I discovered you can get the c2s process identifier for a given user by calling ejabberd_sm:get_session_pid/3 which takes the user, server, and resource. Once we know the process identifer, we can send it a message with Erlang's ! operator.

First let's finish out on_presence/4 by replacing the placeholder comment with the disconnect message to the c2s process for the user, shown below.

%% disconnect the user
SID = ejabberd_sm:get_session_pid(LUser, LServer, Resource),
SID ! disconnect,

Our module is now finished, except that the c2s process will not understand our message. We'll have to provide a new message handler for the disconnect message in ejabberd_c2s.erl. Add the following clause to handle_info/3 in ejabberd_c2s.erl.

handle_info(disconnect, _StateName, StateData) ->
    send_text(StateData, ?STREAM_TRAILER),
    {stop, normal, StateData};

You can insert this clause before the first handle_info/3 clause. Just search ejabberd_c2s.erl for handle_info, and when you find the function definition, just paste this code immediately before it, but after the function's comment.

Now you must recompile ejabberd and restart the server. If you know what you are doing, you can load this patch into the server while it is running. We did this at Chesspark to avoid taking the server down. This is another beautiful feature of ejabberd.

mod_sunshine In Action

We're all finished, and mod_sunshine can be deployed to stop the evil presence storms. To test that it works, just send the same presence stanza a bunch of times, really fast. You will find yourself quickly disconnected. You will probably need to write a quick test client for this as you might not be able to trigger duplicate presence with your normal XMPP client. I highly recommend Strophe.js for this task.

I hope that this tutorial has been helpful to you, and that you use this knowledge only for good. Go have fun implementing your wild and crazy ideas for server modules! If you have any suggestions and questions, please let me know in the comments.

The full source of mod_sunshine.erl is available here.

:EXTENDED: