Linkding as a poster child for self-hosting

I recently installed LinkDing on my Nomad cluster, and I am exceedingly pleased with it.

LinkDing is a self-hosted bookmark manager. My setup is using the official Docker container, with a dynamic host volume to ensure the DB doesn’t go poof when I inevitably need to restart the task.

The core features that I care about are:

  • Official container, meaning I don’t need to build my own or rely on someone else to package up the app and build something that might not work properly
  • Documented API, allowing me to fold, spindle, and mutilate my data however I want, with whatever workflows I want now or in the future
  • Tags
  • sqlite database, which simplifies backups considerably
  • PWA for mobile, meaning I don’t need a separate app on my phone

Spinning up the container was extremely simple, and required almost no effort. It’s built in Django, which means if I really need to get my hands dirty, I have the relevant experience to debug and make PRs for it. The Django admin is also nice for fixing low-level data issues, such as “I really should have made these tags properly cased, I need to modify the tag name itself”.

And while it’s not anywhere near the top of my priorities list, the UI is very clean and enjoyable. That’s always a plus when it ticks all the other boxes.

It took a bit of effort (I’m a bit of a monster when it comes to keeping tabs open), but I was able to clean up my phone and desktop so that I only have a few dozen active tabs open on either device now. This is a far superior situation to be in compared to the hundreds of tabs I was keeping around because I didn’t want to just throw them into the void that is my browser bookmarks manager.

But Why?

You may be asking, “But Joe, why wouldn’t you just use your browser’s bookmark manager? It even has cloud sync so you can keep your bookmarks together across multiple devices.” That’s a legitimate question, even acknowledging my penchant for wanting to self-host All The Things.

The core of it boils down to 3 things:

  • I use different browsers across different devices
  • I don’t want to be dependent on a third-party service
  • Privacy

Devices, or Why I Can’t Use Firefox On My Phone

The biggest issue with using a native browser bookmark manager is that it requires you to use their browser on all your devices.

My preference of browser is Firefox, for ideological reasons that don’t matter to this discussion. On my desktop, I use the AdBlockPlus extension to kill the majority of ads across the web. But on mobile, you can’t install extensions (or at least, there was no way to do so last time I checked). So on mobile I use the official AdBlock browser that ABP puts out. However, that’s based on Chromium, not Firefox. Just with this one (reasonable, not all that wierd) constraint, I’m forced to use two different browsers, which immediately kills the idea of unified bookmarks via browser-backed sync services.

Third-Party Services Considered Harmful

I’m aware of the existence of third-party hosted bookmark management/sync services. However, this is not a problem that is worth spending money for (as evidenced by the fact that I’ve lived with this issue for the past decade and just now finally dealt with it). And honestly, I dislike integrating with third-party hosted services as a general rule. Sometimes it’s the best option, but you’re taking on various risks when doing so:

  • What happens if their service goes down temporarily?
  • What happens if their service goes down permanently?
  • What if they don’t support future workflows that I might want?
  • Who has access to my data?

The first two are concerns you always have to consider when looking at a service of any kind. With self-hosted solutions, avoiding downtime becomes my problem, and I happen to know the solutions.

The third is a bit of a wash, but having an open API to manipulate my data is always going to be a plus. Native integrations aren’t always enough.

The final risk leads right into my last talking point, which is…

Privacy Starts With You

I’m a fairly open person. Ask me a question, and you’re likely to get an answer, even if it’s personal. But there’s a world of difference between me deciding to share information and my information just being available for general access.

The difference is “consent”. Just like someone touching your body, you can make the choice to allow someone to touch you, but if someone forces themselves on you, that’s assault. Privacy is the same way. Controlling what information you put out there is your choice. Lots of folks make bad choices with their information, not realizing how much they’re exposing themselves. I don’t have that excuse.

I grew up when the internet was first taking off. I spent my days online in chatrooms and forums, having discussions and roleplaying with folks from across the planet. I’m now a professional IT worker. I know all the different ways that your information can get used, abused, and spread around. Once it’s on the internet, it’s no longer private. This is especially true with the constant barrage of data breaches we see. If it’s only available on my home network, it’s a lot harder to be exposed publically (not impossible, mind, just no longer the lowest hanging fruit for cybercriminals).

With that in mind, using a third-party sync service, whether provided by a browser or a vendor, simply isn’t something I want to do. My bookmarks, while not at the same level of sensitivity as my PII or HIPAA data, is still reasonably sensitive, and does have some potential overlap. It would be easy to see what things I’m interested in, what things I’ve researched, my preferences, and potentially what products I’m evaluating. And that’s not even getting into answering questions like, “What kinds of adult content do you watch?” or “Are any of the links related to diseases or medical conditions?”

Even something as innocuous as a bookmark provides information about the person who added that bookmark, and taken together they can paint a remarkably deep picture about a person.

A Practical Digression

Stepping away from the privacy portion of this, there’s an additional benefit to self-hosting tools that let you keep your data open and accessible (read: have an explicit API): You can define your own workflows that aren’t dependent on someone else building integrations. This may seem like something that only a developer would like, but I’m firmly of the belief that anyone can learn to code, it doesn’t take all that much effort or skill, and it opens up an entire world of automating away your tasks.

For instance: I collect academic papers, generally relating to Computer Science and algorithms for solving problems I’m exploring. I’ve started using Paperless-ngx to store those in a centralized place for reference and data storage convenience. I also just went on a “pull a copy of every paper this particular researcher has published within the past decade” binge (it was 30+ papers). Now, I could absolutely manually download all of those files and then manually upload them into Paperless. That’s what the normal workflow would be.

However, I’m a coder. My goal is literally to automate myself out of a job. So I built a simple script to pull a list of links from a file, download those PDFs, and stream them into Paperless. Not difficult, but getting all of those links into a file would have been a pain. Instead, I tossed all of them into LinkDing with a “automated-pull” tag, and then I was able to pull them all in one go via the API.

Long term, I intend to have a cronjob script that pulls entries with that tag regularly and does the same download-upload process, but that’s just an optimization. The core of it is that this didn’t take a particularly large effort to do. The script itself is ~50 lines of Rust, and the APIs for LinkDing and Paperless are already there and require minimal effort to integrate with. The benefit is that I avoided somewhere around an hour of drudgery, and now have a simple workflow that I can use in the future to automagically ingest papers when I encounter them.

You just don’t have that level of flexibility with commercial tooling. Rare is the case where you’ll have unlimited access to your data to do whatever you want with, and even if you do, they might cut you off because you’re harming their infrastructure if you use it in a certain way. Open tooling is critical if you value flexibility.

Conclusion

I originally intended for this to be a discussion about LinkDing, but it turned into more of an article about privacy and reasons you’d self-host even something as simple as a bookmark manager. Hopefully you came away with a new tool that you can try in your workflows, and some new concepts to think about when evaluating who you give your information to, and what information you’re willing to give out to the public.

Cheers!