The most boring sport, Formula 1, is using Youtube to get better

I am not going to lie—when I watched Formula 1 in the 1990s, it was mostly because my fellow countryman Jos Verstappen was enjoying a moderate amount of success in the sport.

And when I started watching it again in the 2010s, it was because Jos’ son Max was entering the same sport, heralded as a great talent.

Formula 1, the fastest sport on earth, has a reputation of excess. Fast cars, beautiful women (not as drivers, unfortunately), cosmopolitan cities, and money and champagne flowing richly. Regardless of how deserved this reputation is, the sport itself, when you have stopped looking at everything that surrounds it and sit down to watch a race, is … often a complete snooze fest.

A Formula 1 race is started by the driver who proved himself to be the fastest during the qualification session a day earlier, followed by the second fastest car and so on.

The result is that the line-up on the starting grid is a pretty good predictor of not just who is going to win, but in which order the drivers will finish. Formula 1 races are often little more than glorified processionals.

It is true that the starting grid does not always predict the results. During the race, drivers will meet with accidents and mechanical problems that may throw them back a few places or even remove them from the race; teams that are good at qualifying, which requires being very fast for just a few laps, don’t always manage to bring that same performance for an entire race (‘race pace’); cars are required to pit at least once, which allows for undercuts and overcuts; and there are a thousand other small ways a race can be won or lost–and the fans know what these are.

That makes Formula 1 a (somewhat) enjoyable sport for the initiated. If you know what you are watching, if you can recognise all the tell-tale signs that something special is going on, if you know the ramifications of details as they unfold in front of you. But that also means that in order to get to like Formula 1, you must already be heavily invested in it. And most people start the other way around; they learn about a thing because they like the thing.

Formula 1 has taken to Youtube to remedy this is as good as they can. In good essasying tradition almost, they will extensively show you before a race what is going to happen, they will show you the race as it is happening, and then afterwards they will explain to you what you have seen.

Over the course of the two weeks between races, you can expect to see the following:

  • Five Shocking Moments – looking back at this race in previous years.
  • Circuit Guide – one of the current crop of drivers explains how they approach the track.
  • Drivers Press Conference – 5 drivers answer questions from the press.
  • Highlights from the 3 practice sessions and from the qualification session, one video each.
  • Paddock Pass – Will Buxton explaining the challenges for each team and interviewing a shed load of drivers.
  • F1 Live: the half hour run up to the race broadcast live.

After the race, Formula 1 will publish a video of race highlights and then the recurring features return:

  • Paddock Pass – another episode, this one post-race: reactions from the drivers.
  • Top 10 Onboards – the 10 most interesting radio messages between drivers and their teams.
  • Jolyon Palmer’s Analysis – a former Formula 1 driver dives deep on some of the things that made the race interesting, reviewing video footage.

And then there are videos that aren’t tied to any specific race, but that do work well in explaining how the sport works. In the past month or so we had:

  • 2019 Drivers’ First F1 Wins – what was the first win of the current crop of drivers?
  • Esteban Ocon’s Journey to F1 and Back – Ocon is a former F1 driver who will return next season.
  • How do F1 Drivers Explain F1?
  • Top 10 Cheeky F1 Innovations – innovations that were eventually banned.
  • Grill the Grid – two drivers of the same team quizzed about F1’s past.
  • 2021 F1 Car First Look – the regulations are ever changing and the car designs follow.

(I cannot embed these videos here, so I have linked to some of them above.)

All these features make it so you can get initiated in the sport in your own tempo, which makes it easier to enjoy the sport even if some of the races are, on the surface at least, boring.

Freelance.nl is bijna exclusief voor tussenpersonen (Dutch)

Ik ben een freelance webdeveloper. Dat wil zeggen dat ik als eenpitter en niet op basis van loondienst voor mijn beroep aan websites werk.

Het grootste deel van mijn opdrachtgevers vindt mij zelfstandig of via mijn netwerk. Ik heb echter ook een account op freelance.nl, de grootste marktplaats in Nederland voor freelancers (althans, dat was het in 2015, toen ik dat voor het laatst gemeten heb).

Eind jaren 2000 heb ik gemeten hoeveel opdrachten op freelance.nl door tussenpersonen/recruiters waren geplaatst, en hoeveel door echte klanten. Die meting heb ik herhaald in 2015 en zojuist nog eens (dus in 2019).

De verhouding klanten/recruiters was in:

ca. 2008 – 3:2

2015 – 4:5

2019 – 1:20

Hierbij mijn meting van vandaag van opdrachten voor klanten:

[schermafdruk: Geen intermediairs matchen - 531 matches]

en opdrachten via recruiters (het totaal is inclusief opdrachten voor klanten):

[schermafdruk: wel intermediairs matchen - 23 matches]

Bij dit soort metingen en vergelijkingen hoort een vrachtlading aan kanttekeningen.

Freelance.nl is niet alleen een van de grootste, maar ook een van de oudste nog bestaande online marktplaatsen voor freelancers in Nederland. De site werkt er voortdurend aan zichzelf te verbeteren, maar een resultaat daarvan is ook dat het lastig is om metingen uit 2009 te vergelijken met metingen uit 2019.

De site had bijvoorbeeld ten tijde van mijn meting uit 2015 nog een categorie webdevelopment, tegenwoordig is dat ICT, wat potentieel een veel wijder net is.

Daarnaast kan het best zijn dat de verhouding klanten:recruiters voor bloemschikkers er veel gezonder uitziet.

En zo zijn er nog veel meer redenen aan te voeren waarom deze metingen lastig zijn te vergelijken. Ik ben echter geen wetenschapper, maar een ondernemer, en soms werk je dan met de getallen die je hebt, niet met de getallen die je zou moeten hebben.

Voor mij persoonlijk is deze verhouding relevant. Ik heb nooit via tussenpersonen gewerkt – het zou te ver gaan om uit te leggen waarom, maar heel in het kort komt het er op neer dat perverse prikkels ervoor zorgen dat er bij opdrachten via tussenpersonen enorm veel ruis op de lijn zit, sterker, dat je vaak niet zeker weet of er wel van een opdracht sprake is – en dus maakt het nogal verschil uit of een site voor 70% uit echte klussen bestaat of voor 95% uit klussen waarvan je nog maar moet zien of het wat is.

Er zou nog een verzachtende omstandigheid kunnen zijn als het aantal opdrachten voor webbouwwerk hetzelfde was gebleven in absolute zin, maar dat lijkt niet het geval te zijn. Over ruwweg dezelfde periode gemeten (einde zomer) is het aantal opdrachten in 2019 een kwart van wat het in 2015 was.

Het kan zijn dat ik mijn mening over tussenpersonen moet bijstellen, maar waarschijnlijker is dat freelance.nl een minder opvallend puntje op mijn radar gaat worden.

Update 18 september 2019

Toen ik op een van die zeldzame opdrachten-voor-klanten wilde reageren, viel me de voorbeeldtekst van het reactieveld op:

“Beste recruiter, ik ben de beste kandidaat, omdat…”

Dat is toch echt tussenpersonentaal. Echte opdrachtgevers en echte opdrachtnemers noemen elkaar niet zo. Dus ongeacht de werkelijke situatie (die, zoals gezegd, lastig te meten en te vergelijken is), is freelance.nl blijkbaar een site die zich aan de opdrachtverlende kant voornamelijk als een site voor tussenpersonen ziet.

In English, in short: a popular website that I used to try and find work as a freelancer, has recently seen a large shift from mostly posting work by actual clients to largely posting work by recruiters. Since, in my experience, postings by recruiters rarely represent actual work, this makes the aforementioned website less useful to me.

Possibly crooked judge gets taken off case about definitely bad doctor

The court of The Hague is perhaps not known as the most even-handed in the world. This is the court where large, foreign media conglomerates shop for copyright jurisprudence. This is also the court that committed a crime in 2014 when it advertised for fresh judges, saying that women needed not apply. That was a clear case of discrimination based on gender, although I doubt anyone served even a day’s worth of gaol time for this.

So when this court dismisses a judge for being biased, that probably means something.

In an appeal in a case between Google and a doctor who had mistreated a patient, a judge was dismissed by the court over a possible conflict of interest, Emerce reported today. The plastic surgeon that this was about had been included on a blacklist, Zwarte Lijst Artsen, that bases its information on another, more opaque blacklist called BIG Register.

The people who run Zwarte Lijst Artsen run a companion blacklist on judges called Zwarte Lijst Rechters, which mainly focusses on judges who have helped absolve doctors from malpractice cases. As it happens, the judge from the initial court case, which was won by the doctor, was on this blacklist, so naturally Google appealed.

When it turned out that a judge in the appeal case also was on that blacklist, the court was unimpressed and unamused, and dismissed her.

At the time of the intial case, legal blogger Arnoud Engelfriet opined that the verdict was as expected and unremarkable: “Considering these facts, the verdict does not surprise me. I also would not call it trail-blazing.”

Engelfriets reasoning (refered to above by ‘these facts’) is a little bit hard to follow, so I won’t go into that here. Suffice it to say that if the BIG Register is so hard for average patients to find and peruse that judges see no reason to shut it down, and entries on another blacklist that is apparently transparent and usable are made hard to find, the court is basically saying that blacklists are de facto only allowed if they are unusable. And in my view that is not a fair weighing between the privacy rights of doctors and the rights of patients, and a neglect of one’s judicial duty.

The judge in the appeal case gave as an argument as to why she wasn’t influenced by the fact that she was on a blacklist herself, that the blacklist for judges wasn’t as impactful as the one for doctors. The court felt that argument irrelevant: “[This is not about] the possibility of a subjective impartiality, but about the objectively justified fear for impartiality”.

In other words, the court wasn’t so much worried that the judge might have a conflict of interest as it was that one of the parties would have the feeling that they were not being treated fairly.

The court will now have to appoint a new judge and then the saga of the plastic surgeon and her pals, the possibly crooked judges, can continue.

Test: scaling images up

I was playing around with scaling up images in The GIMP and stumbled upon a method (scale to larger than you need, then scale down to the desired result) that seemed to get exceptionally good results.

I wanted to find out if this was a fluke, so I ran some tests.

My conclusion appears to be either that playing around to find the right method is exactly what you need, or that more tests are needed.

Scaling images up means that if you have an image of a certain size (w × h pixels), you produce a version of that image that is larger (e.g. 2w × 2h pixels).

Unlike what Hollywood shows like to pretend, this does not lead to images of an equal aesthetic. Upscaling an image generally leads to ugliness, so it is your task to find the method that works best. If you have access to a larger original of the image you are about to scale up, it is almost always better to work from that original image.

Upscaling works by inventing new pixels. The algorithm must take guesses as to what such a new pixel would look like. Typically this works by using neighbouring pixels as hints at least somewhere in the process.


Illustration: how do you scale a 2 pixel wide image to a 3 pixel wide one? You could choose to only copy pixels, meaning that the ratio between the 2 halves of the image will become skewed, or you could choose to mix pixels, meaning there will be colours in the image that weren’t there before.

In the following, your browser may itself scale images up or down to make them fit the available space. I chose widths to scale to that should work fine with the current settings of my blog, but you may have to view the images separately to get a real impression of what they look like.

I started this test with two images:

– The source image, 300 pixels wide.

– The comparison image, 600 pixels wide.

Both images were produced by scaling down (method: cubic) from an approximately 1600 pixel-wide original.

The 300 pixel version would be the source of all the upscale tests, the 600 pixel version would serve as the control—as the ideal target.

All tests were performed with The GIMP.

The GIMP has traditionally had three scaling settings: none, linear and cubic.

‘None’ will try and fit pixels into new pixels, duplicating and discarding pixels where necessary. The result will look blocky regardless of whether you are scaling up and down. In my experience, the best use case for ‘none’ is when you are scaling up or down to exact halves, quarters, eights or doubles, quadruples, octuples et cetera.

‘Linear’ and ‘cubic’ are siblings, they mix pixels where necessary, with cubic doing this the strongest. Cubic is brilliant for scaling down.

I used two target widths: 400 pixels and 600 pixels.

(There is no 400 pixel control image, but I trust the 600 pixel image will suffice here.)

I applied the following tests:

none: scale up to the target width using scaling algorithm ‘none’.

lin: scale up to the target width using scaling algorithm ‘linear’.

cub: scale up to the target width using scaling algorithm ‘cubic’.

none + cub: scale up to more than the target width using scaling algorithm ‘none’, then scale down to the target width using scaling algorithm ‘cubic’.

Scaled to 400 pixels wide (factor 1.3)

Scaled to 400 pixels wide using ‘none’:

Scaled to 400 pixels wide using ‘linear’:

Scaled to 400 pixels wide using ‘cubic’:

Scaled to 400 pixels wide by scaling up to 600 pixels wide using ‘none’, then scaling down to 400 pixels wide using ‘cubic’:

Scaled to 600 pixels wide (factor 2)

Scaled to 600 pixels wide using ‘none’:

Scaled to 600 pixels wide using ‘linear’:

Scaled to 600 pixels wide using ‘cubic’:

Scaled to 600 pixels wide by scaling up to 900 pixels wide using ‘none’, then scaling down to 600 pixels wide using ‘cubic’:

My hope had been that the latter would provide the best upscaled images, but to be honest, I do not see much difference between scaling up with the linear setting and the method where you first scale up and over using none, then scale down using cubic. In fact, having done some pixel peeping I think that I prefer—for this test at least—the images scaled up using the Linear algorithm.

(Show here the difference between a linearly upscaled image and an image scaled up using the scale-over-then-down method.)

All images were saved at JPEG quality level 82, for no other reason than that is my default setting.

The difference between a cheapo ‘netbook’ and a high-end laptop is…

… about 450 gigabytes in storage.

[two screenshots]

I was looking for a cheap, small form-factor laptop on a comparison site that lists thousands of them and I found plenty of cheap ones.

When I made the two screenshots above, I had only selected a screen size, and I had sorted the results by price. The left side of the illustration shows Chromebooks and such, with storage between 16 and 64 gigabytes and prices around 150 euros. The thing I changed to get the results on the right (prices around 1,000 euros) was to set the minimum storage to 500 GB.

When I indicated I needed more than just a handful of bytes of storage, the prices sky-rocketed.

Now I know there are more differences than just storage between these two categories, but I don’t need a better screen or a faster processor to watch some videos, write e-mails and read blog posts. Storage would be nice though.

I guess if you want a cheap, small laptop with a decent amount of storage, you will have to swap out the SSD yourself.

Drupal 7 module integration testing

The information below is available in other places, but I figured I would bring it together as a sort of note-to-self, because this had me stumped for a day.

Suppose you have a Drupal contrib module A that you are writing tests for.

Contrib module B is optionally extending the functionality of module A, and you want to test the integration of these two modules.

Drupal has tons of modules that either are built specifically to extend the functionality of another module, or that are built as some sort of low-level API module that lets you do cooler stuff with the functionality you already have. For example, the Feeds module is optionally extended by Feeds Tamper, Feeds Import Preview, Feeds Entity Processor and so on.

So let us say that somewhere in your code for module A you have the following:

if (function_exists('module_b_api_function')) {
  // Do cool stuff with module_b_api_function().
}

Now in order for your test runner to call this function, you need to tell it about the existence of the dependent module. It needs to be able to activate the module or to fail gracefully if it cannot (for instance if the second module does not exist).

The following assumes you are acquainted with the Drupal 7 built-in testing system (Simpletest), specifically with the DrupalWebTestCase.

There are three places where you define the integrated module you want to load: your module’s .info file and in the the setUp() and getInfo() methods of the .test file for your integration test.

1. In your module’s .info file, add:

test_dependencies[] = module_b

(where ‘module_b’ should be replaced by the name of the module that provides the extended functionality).

This alerts the drupal.org test runner that it needs to add module B to its code base.

1.b. If you are patching an existing contrib module, you may wish to create a separate patch of just this change first, because the drupal.org test runner needs to know about your test_dependency before it starts running the actual integration test patch.

Update 20 August 2019: this preliminary patch must be committed to the module’s development branch by the module’s maintainer. This is something that at first completely flew by me, because all of the documentation on this particular test runner quirk was written by module maintainers, and when they write ‘commit’, it means something else than when I read ‘commit’—in my mind, committing refers to my local repository. Yeah, I know…

2. Your setUp() method might look like this:

public function setUp(array $modules = array()) {
  $modules[] = 'module_b';

  parent::setUp($modules);
}

This enables the modules during the test run.

3. Your getInfo() might look like this:

public static function getInfo() {
  return array(
    'name' => 'Module B integration tests',
    'description' => 'Test integration with the B module.',
    'group' => 'module_a_group',
    'dependencies' => array('module_b'),
  );
}

The Simpletest uses the return value amongst others to check (in simpletest_test_get_all()) that the modules the test depends on, are discoverable. If they are not, your module integration test is skipped.

If you leave out this third step, the testing system will halt with a fairly useless error message upon discovering it cannot load the module. That is OK for you, now that you know what is happening, but not for others who might never have seen your tests and are just seeing their own test runs fail. Having your test skipped in case of a missing module is nicer.

Google’s featured snippets algorithm is quite smart, at times

As a I was trawling through the statistics for this site, I noticed how popular my post about creating rounded corners with The Gimp is.

I mean, I knew it was popular, it is the post with by far the most comments on this blog, but I did not know it outperformed other posts by an order of magnitude.

And one reason, it turns out, is that Google heaps mountains of love on that post for search phrases like “gimp rounded corners”. In fact, my post is presented as a featured snippet for that phrase (without the quotation marks), meaning that not only is it returned as the first search result, but it also receives a special presentation that makes it really stand out (see illustration).

[screenshot of Google search results]

But what I find most remarkable is how Google’s algorithms managed to extract a numbered list from the way I did it. Not so much that they translated “Step 1” to just “1”, but that they realised that labelling things “step” is one of a myriad of ways one can present a list.

[screenshot of my blogpost]

Also note how the algorithm automatically skipped my intro. (That is not just smart, that is wise.)

A customer once asked me if I could make a tool that would take a person’s resumé, regardless of how they had formatted it, and extract all relevant information from it in a structured manner. I said yes, I can do this, it is pretty much what I have trained to do in university, and an optimistic estimate for my time would be about 18 months.

This upset the customer no end, because they thought a week and a half, tops, was what was needed.

I hope you see the similarities between the two projects, and assuming I know my stuff, a lot of work must have gone into the Featured Snippets project just to make it feel like it works most of the time. Things can still go wrong with search snippets (says a Google blog post from January this year). People will inject their tutorials and explanations with their own personality and opinions, which on the whole must be a good thing, except that sometimes these personalities and opinions are of a rather unsavoury type. “How to torture puppies” is not a tutorial healthy people would like to see boosted by Google.

Google must have seen their Featured Snippets feature as just one way of organising search results in a manner that is most helpful to its users, but a ‘naive’ visitor may see Google giving certain results a pedestal, and that is a valid interpretation. (Certainly the SEO world appears to be eyeing search snippets with a greed that belies the mere interpretation of Featured Snippets as ‘superior presentation’.)

State of the CMS in 2018

After predicting in 2004 (without naming names) that tools like Wordpress, Drupal and Joomla might become popular CMSes … they did! In 2010 and 2014 I followed up with articles exploring which of these tools had become popular and how they described themselves over time respectively.

Re-reading these articles makes me realise how quaint the premise must seem to a modern audience. It is as if I predicted horses would be called horses in the future. What is so special about predicting the obvious? But even though in 2018 these tools appear to be the very definition of CMS-es, in 2006 they weren’t. If you Googled for CMS-es 12 years ago, you would get completely different names (none of which I remember as they sank into obscurity over time).

In those days you had, apart from ‘real’ CMS-es also forum software, blogging software, so-called ‘nukes’ (community software) and so on. Wordpress and Drupal were blogging tools back then, and Joomla a nuke.

So let us see what has changed.

Wordpress
2010: Semantic personal publishing platform
2014: Web software you can use to create a beautiful website or blog
2018: Open source software you can use to create a beautiful website, blog, or app

Drupal
2010: Open source content management system
2014: Open source content management platform
2018: Open source content management system

Joomla
2010: Dynamic portal engine and content management system
2014: Content management system
2018: Content management system

As you can see, barely anything has changed the last four years, the owners and developers see their tools as conceptually the same (even though the web has changed a lot in that time).

Something else that hasn’t changed much is popularity. Wordpress, Drupal and Joomla are still the outright leaders, and like four years ago, Wordpress still dominates with a market share of about 60 %. What has changed is that both Drupal and Joomla have shrunk, they were the largest of the small CMS-es in 2014 and are even smaller now.

Developments that I find alarming, but pretty much the entire industry seems excited about, are the introductions of headless (or decoupled) CMS-es and of services. In this future, a website is a container that collects and presents data from several sources through standardised APIs, with the CMS being one of those sources. The web itself becomes an app delivery platform in this scenario, and the choice for a CMS becomes less about “what do I want this site to look like?” and more about “how do I want content to be managed?” The word ‘app’ did not appear by accident in Wordpress’ 2018 tagline.

I have not been able to find any evidence of significantly popular commercial CMS-es in 2018. Which probably means that the ones that exist only serve the high-end market.

Update 5 January 2018: Wordpress has released version 5 of its system which includes the so-called Gutenberg editor, one of the largest changes in the CMS’ history. Why risk alienating your entire user-base by introducing a costly disruption nobody asked for and nobody needs? Because at the bottom of the market, hosted CMSes are busy nibbling at Wordpress’ feet. And even though these CMSes still only make up a tiny part of the market (too small for me to mention half a year ago), this development has got Mullenweg and pals scared. So yeah, speaking of developments, hosted CMSes like Wix and Squarespace might pop up on my next report, 3.5 years from now.

American websites improved due to European privacy laws

An interesting side-effect to the introduction of the GDPR, the latest EU privacy law, was that (for Europeans at least) several American websites improved.

Instead of a dazzling and confusing cornucopia of banners and clickables, the sites of USA Today and NPR refocused on their stated goal, i.e. journalism.

See here for two examples:

and

Would you not much rather read the European versions of these sites than the American ones?

The one site that seems confused is Google:

This seems like a link to an article in the LA Times about that same publication suing the city of Los Angeles, but if I click that link, I get a message saying “our website is currently unavailable in most European countries.”

The LA Times has chosen that rather than making a version of its website that does not heavily infringe upon the privacy of its visitors, it will simply show nothing to Europeans.

This is the same Google that for some bizarre reason wants to fine-tune every aspect of my ‘search experience’, to the point that my search results are never the same as anybody else’s results for the same search phrase. Yet they are unable to filter out websites that refuse to show me relevant content.

Privacy audits and GDPR observations

The introduction of the European privacy act known as GDPR seems to have caused a flurry of work in the web development business, but oddly and unfortunately enough I seem to have been immune to this development.

So I decided that I would go through the process of improving one of my own websites, just for practice, and see what I could learn from that. Here is what I found.

So the GDPR is a law from 2016 that builds on earlier attempts by the European Union to anchor privacy as a basic human right for all its citizens. It is an extension, in a way, of the EU’s attempts to turn itself into a vast, wasteful, undemocratic political entity that enormously exceeds its initial scope. Initially the EU was to be an economic union that dealt with things like standardising on electric outlets and shoe sizes.

What the GDPR added to earlier legislation was a bite. From now on, offenders could be hit with significant fines.

Proponents of the GDPR like to claim that the law is based on the principle of privacy-by-design, meaning you need to structure your systems and services in such a way that people’s private lives remain private, and that if you want more from them, you need to get explicit and freely given permission. Let us see how that pans out, shall we?

In the past few months, unless you have been living under a rock, you have been flooded with privacy related messages. These tended to take one of two forms:

  1. The weak: “Please, please, please, please, please let us keep spamming you. We are begging you.”
  2. The strong: “Here is what will happen. You will give us permission to sell all your personal data to the highest bidder, or we will stop our relationship here.”

If the service needs you more than you need it, you would have gotten the former request. But if you need the service more than they need you, let us say the Googles and Facebooks of this world, they get to dictate the terms under which they use your personal data. That doesn’t sound like privacy-by-design to me, that’s just plain old neo-liberalism and greed at work.

So that is what the GDPR is, but for the proprietors of websites it is much more important to know how to comply. The catch-all case for GDPR compliance is, as you have seen, express and explicit consent. A website owner needs to identify all his uses of personal data, explain to a visitor what those uses mean, and then ask permission for those uses.

Luckily there are a number of exceptions where the rights of the proprietors would be unnecessarily burdened if they had to ask for permission. One such exception is a technical necessity: a website would not work if your user had the option of saying no. For example, in order for a web shop to work, you have to be able to ask the visitor for billing and shipping information.

Another exception is freedom of speech. If you are writing an article about someone, you don’t have to ask them for permission before you publish the article.

Keeping data around for legal obligations is a third exception.

The above nicely lays out how you perform a privacy audit. You make three lists:

  1. Which personal data do you process?
  2. For each of these data, which use do you make of them?
  3. For each of these uses, what are your grounds for having them?

Apart from this audit, there are other things you need to do that are beyond the scope of this posting. For example, you also need to determine if you export personal data to foreign countries. (For example, if you are in the Netherlands, do you have Facebook buttons on your website? These buttons collect personal data and Facebook is an American company.) And you also need to determine for each item how long you are going to keep it, and so on.

The meanings of several terms seem obvious at first sight until you are going to perform your audit and then they become vague and confusing.

Personal data are data that can be used to identify a natural person. The logical conclusion might be that nothing then is personal data, because on the internet nobody knows you are a dog. That would make the law toothless and so judges have been using a much roomier definition in which anything that comes close to identifying you can be personal data: names, e-mail addresses, IP addresses and so on. Look out especially for combinations of data. You might argue successfully that an IP address by itself is not personal data, but IP addresses are rarely processed in isolation.

There is a special class of data that gets extra protection, things like gender, age, sexual orientation and so on.

Processing refers to anytime you touch personal data. Collecting contact information is processing personal data. Storing contact information is processing personal data. Sending this information to your e-mail address is processing personal data.

In other words, both ‘personal data’ and ‘process’ are pretty broadly defined.

The website I have been auditing, and for which I have subsequently written a privacy statement, is a Wordpress-based website. Not everything that goes for Wordpress will apply to your website, but I believe several of the lessons I learned could be relevant to any website.

I have identified five elements of a Wordpress website that come into play. If I missed any, please note them in the comments.

  • Wordpress core
  • Plug-ins
  • Themes
  • Widgets
  • Embedded content
  • Hosting

Wordpress core is the base package that you get when you download and install Wordpress on a webserver. If all you used Wordpress for is publish pages and blog posts containing nothing but plain text, you would still be processing personal data.

Plug-ins are pieces of additional functionality created to plug into the Wordpress API (programming interface).

Themes determine the look rather than the functionality of your website.

Widgets are small, very specific pieces of additional functionality that run on top of Wordpress rather than hooking into it.

Embedded content is content hosted somewhere else, but mixed up with your own content. Lots of website owners will for example use the Twitter.com widget to quote tweets in their articles.

A web host is something your Wordpress site runs on top of, and web hosts can collect personal data too. For example, many classic web servers are set up to log every visit by storing the IP address of the visitor, the page they requested and the time of the visit.

There is a strong overlap between plug-ins, themes, widgets and embedded content, to the point where there really is not even that much difference under the hood between a plug-in and a template. The differences are mainly conceptual. For an audit, however, it is useful to treat these as different parts of your website, because your admin interface will typically present these four elements differently.

I spent about 23 hours auditing a fairly simple Wordpress website. In that time I also wrote my privacy policy. That is pretty insanely large amount of time, if you ask me.

Now for me this is business and those are 23 hours well spent, time that will pay itself back in future projects. But what if you wanted a place on the web for your digital soap box, a place for your rantings and ravings? What if I told you that before you set all that up, you were legally required to spend three whole days figuring out in how many (often inadvertent) ways you were going to violate your visitors’ privacy?

What is more, you are exposed to the same multi-million dollar fines as large, wealthy organisations are. So far I don’t now of a country ogrish enough to impose million dollar fines on private bloggers, but hey ho, these are strange times.

Would you still go ahead with that website?

So the GDPR is a huge impediment to free speech, and not only that, but it limits the speech of smaller, weaker parties such as private bloggers far more than it does the speech of large corporations. The GDPR is certainly annoying to the latter, but ultimately acceptable.

But there are caveats to that conclusion.

Breaches of privacy are in itself also huge impediments to free speech. If you are afraid to speak because you are afraid someone will come after you, you may be scared in staying silent.

(The thing is though, will the GDPR make much of a difference here? I do not expect the GDPR to make any meaningful difference to the practice of doxing for example. Twitter is as a processor under no obligation to halt the practice, and the doxers themselves can claim a free speech exemption.)

Also, this is a new law and things need some time to settle in. Wordpress has just released a version of its software that comes with a built-in privacy statement and for which it has already performed the privacy audit part of Wordpress Core for you. If you install no other themes, plugins and widgets, you are almost good to go. (You need to add some info about how you are going to secure your site, how long you are going to keep certain data and so on.)

So there is some hope there.