Long time no see! It’s been a while since my last post, but I’m definitely alive and okay.
You may have heard about it or not, it’s a federated and distributed social network, similar to Twitter in some ways but also completely different.
- The network is distributed and is controlled by users, not by a single company
- The network is made of instances (nodes), where user can sign up, pretty much like email providers
- Users can communicate with users from other instances
- Everyody can spin up an instance, there is no central authority
- Message length is 500 characters
- Lot of great people here!
I’ve launched my personal instance around march 2017, and I’m really happy I did, as I met a lot of interesting people.
When I joined the network, it was booming. If I remember correctly, there were less than 100k users at that time. Today, there are more than one million.
The interesting thing is: there is no central authority to track those stats. So how can you know how many users are in the network at a given time? Or about user countries? Or activity?
MNM started for that reason. Basically, it recorded available stats for each instance it had access to and provided dashboards to visualize this data and, hopefully, get some insight.
Fortunately, other projects existed that made the whole process easier. Especially, instances.social was already gathering a lot of data, and I just had to plug my own project in it.
Here are two pictures that were generated using the recorded data:
This one shows the user growth from april to december 2017. Before that, it’s blank, not because there was nobody in the network, but because MNM was not here yet.
As you can see, it’s pretty steady. Now, about the actual meaning of those stats:
- User means a user account opened on a publicly accessible instance.
- It does not account for any kind of user activity. It shows how many user accounts were opened on the network, not how active these accounts are.
- One person can open multiple accounts on multiple instances, and I personally know a few mastodon members that have 3 or 4 accounts. So there are probably not one million people using mastodon.
The second one shows the location of Mastodon user accounts based on their instance IP.
As you can see, the main pool of mastodon users is Japan, followed by Europe and United States. For detailed graphs, you can browse to the country dashboard, but basically, japanese accounts represents roughly 2/3 of all the accounts, while french ones make ~20% and US ones a little less than 10%.
Now for the limit of those stats:
We are deducing account countries from their instance domain name if they use a country TLD, and fallback otherwise on their IP address. This means an instance registered with a
.fr domain name is marked as french, and an instance
using a generic TLD such as
.com and hosted behind a german IP adress is marked as german.
This is far from perfect, for a few reasons:
- A french user creating an account on a german instance would be recorded as a german account
- Relying on TLDs and IPs to guess an instance country is not especially reliable:
- People sometimes use domain names to make puns, like
talkto.me, or simply don’t rent a domain name matching their country
- IP adresses are not stable in time, and usually reveals the hosting provider location
- People sometimes use domain names to make puns, like
Keeping that in mind, I think those stats are still pretty helpful at showing trends.
A note about data collection
For a lot of data points, such as the number of user accounts or published statuses, instances are queried directly. A queried instance could lie about its number of accounts, and we would not have any easy way to detect that.
It actually happend: at some point, we’ve seen a sudden and huge bump in accounts on the network, that were originating from the same instance. The instance was advertising more than 100k user accounts, while it only had a few dozens.
This creates interesting challenges to detect and handle those cases. I have no proper solution for that at the moment, and I honestly think it’s a really hard problem to solve in a distributed architecture, as there is no single source of truth.
Remember that when you see strange datapoints, either on MNM or other projects, because most of the time, the data comes directly from each instance and is not checked or verified beforehand.
Over the last 8 months, I’d say it took me around a week of work to:
- Develop and spin up the initial version of the project
- Add new features and dashboards
- Fix the bugs, downtime and errors that occured
Overall, it is a pretty low maintainance project and most of the work was done in the first weeks. 99% of the time, it’s runningin the background and I’m completely forgetting it until someone mentions me on Mastodon about some missing datapoints or inconsistencies.
On the software/server/hardware side, it’s running on a dedicated server hosted at SoYouStart, which is mutualized with other personal projects. The load on the server itself is usually small and barely noticeable.
The data itself is taking around 4GB on disk, and the project requires a few services to run:
- Grafana, for managing the dashboards
- Django for the homepage
- Celery for the data collection and recurring tasks
- PostgreSQL for storing the latest data points about instances
- InfluxDB, for storing the timeseries data at given intervals (hourly, daily) and make it queriable for the dashboards
Those are a lot of bricks, but the whole thing is dockerized, so management is quite easy.
What could be improved?
I’ve got so many ideas, but not so much time to actually implement them.
I’d really like another MNM mirror to be hosted and managed by someone else. Considering the data gathered, it would be sad to lose it. Even if I have backups, I would feel more confident not to be alone on this aspect.
If you’re even remotely interestind in helping for this, we can get in touch: I can provide you instructions and a full backup of the current database. You don’t even have to run the Grafana or the homepage as it’s not needed if you’re only interested in the data.
Providing public and static backups of the data on a regular basis would also be a good thing.
And gathering user activity stats, too.
And data about the whole fediverse, and not just Mastodon.
And make more dashboards. More analysis. More stuff!
Some of it may happen in 2018, so stay tuned!
Thank you for reading and don’t forget to mention me on Mastodon if you have any question idea, or suggestion regarding MNM. I’m @email@example.com :)