The Robservatory

Robservations on everything…

 

Remove tracking data from copied URLs

A while back, my friend James and I were discussing the amount of tracking cruft in many URLs. In my case, I subscribe to a ton of email newsletters, and I noticed that those URLs are just laden with tracking information—and most go through a URL processor, so you don't really see those tracking details until you've clicked the link, at which point it's too late to avoid any tracking.

I wanted a way to clean up these URLs such that the least-possible tracking information was sent to a server—and in particular, to prevent any browser cookie creation. In addition, if I want to share a link with friends, I don't want to send them a crufty tracker-laden link—I wanted a nice clean shareable URL.

Note: I wrote all of this before I knew about Jeff Johnson's Link Unshortener, which does all of this (and more) in a "real" app. If you'd like the easy solution, Jeff's app is the way to go. Mine is definitely a do-it-yourself concoction that's not for the faint of heart.

tl;dr version: Install this macro group (v7) in Keyboard Maestro to remove tracking details from copied URLs in a set of defined apps. Keep reading if you want some more info on how it works…

Nov 29 2021 Update: There was a bit of a logic bomb in the handling of already-decrufted URLs (or completely clean URLs) that has been resolved. The macro now behaves properly if you copy a clean URL for a domain that would normally be decrufted, or if you copy a crufty URL that has a cleaned version available in your history.

Nov 17 2021 Update: I completely rewrote the history routines—now both the original crufty URL and clean URL are saved. If you copy either the crufty URL or clean URL again, the macro won't re-run curl, it will just look up the clean version and open that. I also added code for flyingmag.com, which now uses an HTML redirect (ugh!), requiring another step to clean.

Nov 1 2021 Update: Added t.co links, as seen on twitter.com (and probably other Twitter clients) to the decrufter.

As an example, here's a URL from a recent Monoprice email newsletter. (Note: I changed a few of the characters so these URLs will not work; they're for demonstration purposes only.) Any tracking data is completely disguised in the source URL; all you see is a jumble of letters and numbers at the end of the URL:

http://enews.emails.monoprice.com/q/LLMxJBmP1xo0XEpHXEE5PmLbqHbNxpVsuheZcOJcm9iZ0Bnc3ltZnN3ZWIuY29tw4gZyWuu1KtfpSL58gax3zGue2fhqQ

When loaded in the browser, all of that resolves to a completely different URL:

https://www.monoprice.com/product?p_id=11297&trk_msg=8OLJJ72Q87O4PCIC9HC71NL7VS&trk_contact=T3KPNYKTNJTD6NU53HNJCDD7T4&trk_sid=FGHVB9JP6L3MVLERLES7DLEFEG&trk_link=9HBDB7KBCAEKT949P8P8JQG5J4&cl=res&utm_source=email&utm_medium=email&utm_term=View+product+recommended+for+you&utm_campaign=210902_thursday

Fairly obviously, everything after p_id=11297 is simply tracking information, so the actual URL is just:

https://www.monoprice.com/product?p_id=11297

Ideally, that's the URL I'd like to load and/or share with friends, not the tracking-laden version. One weekend, I decided to do something about it, and sat down with Keyboard Maestro to work on a solution, thinking it wouldn't be overly complicated...

Months later, I think it's finally reached a point where it's shareable, and the 24 macros in my URL Decrufter group show that it was, indeed, overly complicated. But it's done now, and working quite well.1To extract destination URLs from disguised source URLs, that URL has to go to the server, which means tracking information is sent. However, my macro uses curl to do that, so it's not happening within a browser where cookies could then be set.

Here's how it looks in action…

The features and limitations of my macro are as follows…

  • Obviously, you'll have to have Keyboard Maestro to use the macro.
  • The macro only works on copied links—clicked links aren't modified. Why not? Because capturing mouse clicks and analyzing the clicked item would require "real" programming, instead of a relatively straightforward macro that acts on the clipboard contents. I actually prefer it this way, because I can use a simple click if I don't mind delivering the tracking information, or a right-click and Copy Link when I don't want to share.
  • The macro only works in apps that I define. If the app isn't part of the defined set, then no processing occurs. This is easily changed by modifying the listed apps in the URL Decrufter macro group.
  • The macro only works on a defined list of hosts—it doesn't attempt to capture and interpret every link I copy. This was a design decision, as I didn't want the overhead of processing every copied link, and I have a limited number of known domains that send me tracking-laden URLs.
  • It can take anywhere from a few tenths of a second to a couple of seconds to process a copied URL. Nearly all of this time is during the curl step, which translates the original disguised destination URL into the actual final URL, which is then stripped of its tracking details.

First things first, here's the macro group (v6). It's saved in a disabled state, so it won't be active on load. I suggest you start by opening the _The Decrufter macro, and looking at the comment at the top of the macro. It explains basic usage, how to add domains, and how to (generally speaking) create custom filters for domains that don't match the most common form (where all tracking info follows a "?" at the end of the real URL).

Because of the complexity of the macro, I'm not going to go through it here, as I usually do with simpler macros. But here's the basic process flow:

  1. When you copy something to the clipboard in a monitored app, the macro activates.
  2. If the item on the clipboard is an image, non-URL text, or a URL that's already been filtered, the macro quits.
  3. The URL on the clipboard is compared to a list of hosts (the part between www and com). If there's a match, the macro will run. Otherwise, it quits.
  4. A curl command is used to resolve the destination URL from the copied URL, except for a subset of hosts that don't require curl.
  5. The destination URL is run through a filter (the generic one, or one specific to that domain) to decruft it.
  6. The cleaned URL is pasted back to the clipboard, and opened in the browser in the background.
  7. Both the crufty and cleaned URLs are added to the history file, variables are erased, and the macro quits.

In developing this macro, I learned a lot about Keyboard Maestro's powers (always more than I think they are) and regular expressions (so much power and so much complexity). It's probably of interest to very few people, but now that it's working, I really like the functionality and take it for granted that I can right-click and copy a link to have it open a clean version in my browser.

Note: This macro will run fine as is, but if you want to customize it, you should be comfortable working with Keyboard Maestro's macros. (If you're going to write your own URL filters for the macro, you'll need a good understanding of regular expressions, too.) If you have questions, or want some hosts added to the macro, feel free to ask me here and I'll see what I can do.

7 Comments

Add a Comment
  1. Anyway to avoid popup for youtube which has a ? mark as part of the basic URL. Not sure what a URL with a referral looks like. Or maybe it's there and I'm missing it

Leave a Reply

Your email address will not be published. Required fields are marked *

The Robservatory © 2021 • Privacy Policy Built from the Frontier theme