The search engine that lets you remove the most popular web sites

This is a nifty thing: Million Short is a search engine that will let you remove links to the million most popular web sites from its search results.

Sometimes good (and more importantly, relevant) sites get crowded out by popular sites in the results of search engines like Google. This is a natural result of how Google works — it prefers to present the websites that everybody links to; it sees the linking as a stamp that denotes quality.

Sites that are already popular get a head start as a result. They get shown more in the results, so they get visited more often, so they get linked to more often, so Google likes them better.

And sometimes industries get good at gaming the search engines. For example, that one time I wanted to figure out the architectural style of the American Hotel in Amsterdam*, I got flooded with search results linking to booking aggregators. Now generally I can see how two or three of those would be the most useful results for a simple search like “American Hotel Amsterdam”. And typically you would just refine your search by adding words like “architecture”.

So it is going to be interesting to see how this develops.

*) It probably wasn’t the American Hotel in Amsterdam, but this was some time ago and I can only remember the shock at seeing so many booking sites in the search results. In fact, when I google “American Hotel Amsterdam” now, the second result is a link to the Wikipedia article, which at the time of writing this blog post claims that the hotel was built in the Berlage style.

Drupal legends are legendary

I got this graph from It made me laugh out loud. It shows you which versions of Drupal, one of the more popular off-the-shelf content management systems, are used the most.

For some reason the webmasters of decided to split the then current major version, 8, into all its medium versions. If you just looked at the legend, you might be forgiven for believing that Drupal 8 is very popular. (It is not.)

Below is the graph contrasting the popularity of major Drupal versions against the space they receive in the legend.

Note that I copied the top graph in March of this year. In the meantime, Drupal 9 has become the official current version.

Facebook: watch out for third-party page edit requests

Two years ago, I received an ominous e-mail from Facebook:

“Peope who recently visited your page recommended changes to the information on your page. Please verify the information below for accuracy. [List of changes.] If we don’t hear from you before [11 days from now], the information in question will be automatically updated.”

Users can tell Facebook to change your page.

This change will happen automatically unless you stop it; the change request is not a suggestion to you, but an instruction to Facebook.

Facebook will give you a short time to review and reject this instruction, namely 11 days. If somebody who wants to harm you, knows you are on holiday for instance, they have plenty of time to change your page.

No notification of this appears on Facebook itself. Instead you receive an e-mail from Facebook. This is problematic for at least two reasons I can think of. One is that you can unsubscribe from this type of e-mail, which you may have done for a variety of reasons. The other is that the e-mail comes from, a domain that was plagued by spam a couple of years ago, so a lot of people have blocked mail from this domain as a matter of fact.

Basically, and this is not the first time I have noticed this, Facebook outsources as much manual labour as they can get away with.

Also interesting is that Facebook sees Pages as a largely commercial product. Anybody can set up as many free pages as they like, but from then on will be flooded with requests to buy page views. The message is clear: if you want people to view your page, you will have to pay for it.

I have no solution for this, other than to not bet the farm on Facebook. If information is important enough for you to present on the internet, make sure all of it can be found outside of Facebook.

[screenshot of the dialog through which you can request changes to name, category, phone, website and e-mail address]

[screenshot of the e-mail message the page 'owner' receives]

[The Facebook dialog for managing change requests.]

You can find anything on Google these days

It is Google’s aim to make all the knowledge of the world findable, but is it also Google’s aim to own all the information of the world?

One day I wanted to find out about upcoming events in Amsterdam pop concert venue Paradiso and because I assumed the URL might not be, I googled the venue’s name.

I ended up never going to the venue’s website, because Google puts together a sidebar (show above) that contains the following information:

  • Photos of the venue
  • A map of the location of the venue
  • The name of the venue
  • A link to the website of the venue
  • Directions to the venue
  • Reviews of the venue
  • A description of the venue (taken from Wikipedia)
  • The address
  • Opening hours
  • Seating capacity
  • Phone number
  • A list of upcoming concerts (date, time, band name)
  • A FAQ
  • More reviews, this time from elsewhere on the web
  • Links to the venue’s social media
  • Links to nearby venues

That is a very complete description. That is pretty much everything you would expect to find on the website itself. In many cases, you do not even need to go to the website anymore.

Is this good or bad?

This smells of the old days of portals, when a portal owner like Altavista or Yahoo pretended to be a safe, curated gateway to the internet, but in reality never really wanted its visitors to leave its site.

All of Paradiso’s side hustles remain invisible this way (currently, Paradiso has none). Instead, visitors spend more time with Google, which may be time spent looking at and perhaps even clicking on the ads Google displays.

The organisation whose website gets cannibalised for the juicy bits by Google may even prefer it this way – all the boilerplate in a handy, readable format on Google and all the details on your own website for those who are really interested.

Except would you not really rather have that sidebar displayed with search phrases like “fun night out amsterdam” instead of only with searches for your name?

(I accidentally searched that phrase. Brrr, the listicles! Read that in the same tone of voice as “oh, the horror!” please.) is bijna exclusief voor tussenpersonen (Dutch)

Ik ben een freelance webdeveloper. Dat wil zeggen dat ik als eenpitter en niet op basis van loondienst voor mijn beroep aan websites werk.

Het grootste deel van mijn opdrachtgevers vindt mij zelfstandig of via mijn netwerk. Ik heb echter ook een account op, de grootste marktplaats in Nederland voor freelancers (althans, dat was het in 2015, toen ik dat voor het laatst gemeten heb).

Eind jaren 2000 heb ik gemeten hoeveel opdrachten op door tussenpersonen/recruiters waren geplaatst, en hoeveel door echte klanten. Die meting heb ik herhaald in 2015 en zojuist nog eens (dus in 2019).

De verhouding klanten/recruiters was in:

ca. 2008 – 3:2

2015 – 4:5

2019 – 1:20

Hierbij mijn meting van vandaag van opdrachten voor klanten:

[schermafdruk: Geen intermediairs matchen - 531 matches]

en opdrachten via recruiters (het totaal is inclusief opdrachten voor klanten):

[schermafdruk: wel intermediairs matchen - 23 matches]

Bij dit soort metingen en vergelijkingen hoort een vrachtlading aan kanttekeningen. is niet alleen een van de grootste, maar ook een van de oudste nog bestaande online marktplaatsen voor freelancers in Nederland. De site werkt er voortdurend aan zichzelf te verbeteren, maar een resultaat daarvan is ook dat het lastig is om metingen uit 2009 te vergelijken met metingen uit 2019.

De site had bijvoorbeeld ten tijde van mijn meting uit 2015 nog een categorie webdevelopment, tegenwoordig is dat ICT, wat potentieel een veel wijder net is.

Daarnaast kan het best zijn dat de verhouding klanten:recruiters voor bloemschikkers er veel gezonder uitziet.

En zo zijn er nog veel meer redenen aan te voeren waarom deze metingen lastig zijn te vergelijken. Ik ben echter geen wetenschapper, maar een ondernemer, en soms werk je dan met de getallen die je hebt, niet met de getallen die je zou moeten hebben.

Voor mij persoonlijk is deze verhouding relevant. Ik heb nooit via tussenpersonen gewerkt – het zou te ver gaan om uit te leggen waarom, maar heel in het kort komt het er op neer dat perverse prikkels ervoor zorgen dat er bij opdrachten via tussenpersonen enorm veel ruis op de lijn zit, sterker, dat je vaak niet zeker weet of er wel van een opdracht sprake is – en dus maakt het nogal verschil uit of een site voor 70% uit echte klussen bestaat of voor 95% uit klussen waarvan je nog maar moet zien of het wat is.

Er zou nog een verzachtende omstandigheid kunnen zijn als het aantal opdrachten voor webbouwwerk hetzelfde was gebleven in absolute zin, maar dat lijkt niet het geval te zijn. Over ruwweg dezelfde periode gemeten (einde zomer) is het aantal opdrachten in 2019 een kwart van wat het in 2015 was.

Het kan zijn dat ik mijn mening over tussenpersonen moet bijstellen, maar waarschijnlijker is dat een minder opvallend puntje op mijn radar gaat worden.

Update 18 september 2019

Toen ik op een van die zeldzame opdrachten-voor-klanten wilde reageren, viel me de voorbeeldtekst van het reactieveld op:

“Beste recruiter, ik ben de beste kandidaat, omdat…”

Dat is toch echt tussenpersonentaal. Echte opdrachtgevers en echte opdrachtnemers noemen elkaar niet zo. Dus ongeacht de werkelijke situatie (die, zoals gezegd, lastig te meten en te vergelijken is), is blijkbaar een site die zich aan de opdrachtverlende kant voornamelijk als een site voor tussenpersonen ziet.

In English, in short: a popular website that I used to try and find work as a freelancer, has recently seen a large shift from mostly posting work by actual clients to largely posting work by recruiters. Since, in my experience, postings by recruiters rarely represent actual work, this makes the aforementioned website less useful to me.

Drupal 7 module integration testing

The information below is available in other places, but I figured I would bring it together as a sort of note-to-self, because this had me stumped for a day.

Suppose you have a Drupal contrib module A that you are writing tests for.

Contrib module B is optionally extending the functionality of module A, and you want to test the integration of these two modules.

Drupal has tons of modules that either are built specifically to extend the functionality of another module, or that are built as some sort of low-level API module that lets you do cooler stuff with the functionality you already have. For example, the Feeds module is optionally extended by Feeds Tamper, Feeds Import Preview, Feeds Entity Processor and so on.

So let us say that somewhere in your code for module A you have the following:

if (function_exists('module_b_api_function')) {
  // Do cool stuff with module_b_api_function().

Now in order for your test runner to call this function, you need to tell it about the existence of the dependent module. It needs to be able to activate the module or to fail gracefully if it cannot (for instance if the second module does not exist).

The following assumes you are acquainted with the Drupal 7 built-in testing system (Simpletest), specifically with the DrupalWebTestCase.

There are three places where you define the integrated module you want to load: your module’s .info file and in the the setUp() and getInfo() methods of the .test file for your integration test.

1. In your module’s .info file, add:

test_dependencies[] = module_b

(where ‘module_b’ should be replaced by the name of the module that provides the extended functionality).

This alerts the test runner that it needs to add module B to its code base.

1.b. If you are patching an existing contrib module, you may wish to create a separate patch of just this change first, because the test runner needs to know about your test_dependency before it starts running the actual integration test patch.

Update 20 August 2019: this preliminary patch must be committed to the module’s development branch by the module’s maintainer. This is something that at first completely flew by me, because all of the documentation on this particular test runner quirk was written by module maintainers, and when they write ‘commit’, it means something else than when I read ‘commit’—in my mind, committing refers to my local repository. Yeah, I know…

2. Your setUp() method might look like this:

public function setUp(array $modules = array()) {
  $modules[] = 'module_b';


This enables the modules during the test run.

3. Your getInfo() might look like this:

public static function getInfo() {
  return array(
    'name' => 'Module B integration tests',
    'description' => 'Test integration with the B module.',
    'group' => 'module_a_group',
    'dependencies' => array('module_b'),

The Simpletest uses the return value amongst others to check (in simpletest_test_get_all()) that the modules the test depends on, are discoverable. If they are not, your module integration test is skipped.

If you leave out this third step, the testing system will halt with a fairly useless error message upon discovering it cannot load the module. That is OK for you, now that you know what is happening, but not for others who might never have seen your tests and are just seeing their own test runs fail. Having your test skipped in case of a missing module is nicer.

Google’s featured snippets algorithm is quite smart, at times

As a I was trawling through the statistics for this site, I noticed how popular my post about creating rounded corners with The Gimp is.

I mean, I knew it was popular, it is the post with by far the most comments on this blog, but I did not know it outperformed other posts by an order of magnitude.

And one reason, it turns out, is that Google heaps mountains of love on that post for search phrases like “gimp rounded corners”. In fact, my post is presented as a featured snippet for that phrase (without the quotation marks), meaning that not only is it returned as the first search result, but it also receives a special presentation that makes it really stand out (see illustration).

[screenshot of Google search results]

But what I find most remarkable is how Google’s algorithms managed to extract a numbered list from the way I did it. Not so much that they translated “Step 1” to just “1”, but that they realised that labelling things “step” is one of a myriad of ways one can present a list.

[screenshot of my blogpost]

Also note how the algorithm automatically skipped my intro. (That is not just smart, that is wise.)

A customer once asked me if I could make a tool that would take a person’s resumé, regardless of how they had formatted it, and extract all relevant information from it in a structured manner. I said yes, I can do this, it is pretty much what I have trained to do in university, and an optimistic estimate for my time would be about 18 months.

This upset the customer no end, because they thought a week and a half, tops, was what was needed.

I hope you see the similarities between the two projects, and assuming I know my stuff, a lot of work must have gone into the Featured Snippets project just to make it feel like it works most of the time. Things can still go wrong with search snippets (says a Google blog post from January this year). People will inject their tutorials and explanations with their own personality and opinions, which on the whole must be a good thing, except that sometimes these personalities and opinions are of a rather unsavoury type. “How to torture puppies” is not a tutorial healthy people would like to see boosted by Google.

Google must have seen their Featured Snippets feature as just one way of organising search results in a manner that is most helpful to its users, but a ‘naive’ visitor may see Google giving certain results a pedestal, and that is a valid interpretation. (Certainly the SEO world appears to be eyeing search snippets with a greed that belies the mere interpretation of Featured Snippets as ‘superior presentation’.)

State of the CMS in 2018

After predicting in 2004 (without naming names) that tools like Wordpress, Drupal and Joomla might become popular CMSes … they did! In 2010 and 2014 I followed up with articles exploring which of these tools had become popular and how they described themselves over time respectively.

Re-reading these articles makes me realise how quaint the premise must seem to a modern audience. It is as if I predicted horses would be called horses in the future. What is so special about predicting the obvious? But even though in 2018 these tools appear to be the very definition of CMS-es, in 2006 they weren’t. If you Googled for CMS-es 12 years ago, you would get completely different names (none of which I remember as they sank into obscurity over time).

In those days you had, apart from ‘real’ CMS-es also forum software, blogging software, so-called ‘nukes’ (community software) and so on. Wordpress and Drupal were blogging tools back then, and Joomla a nuke.

So let us see what has changed.

2010: Semantic personal publishing platform
2014: Web software you can use to create a beautiful website or blog
2018: Open source software you can use to create a beautiful website, blog, or app

2010: Open source content management system
2014: Open source content management platform
2018: Open source content management system

2010: Dynamic portal engine and content management system
2014: Content management system
2018: Content management system

As you can see, barely anything has changed the last four years, the owners and developers see their tools as conceptually the same (even though the web has changed a lot in that time).

Something else that hasn’t changed much is popularity. Wordpress, Drupal and Joomla are still the outright leaders, and like four years ago, Wordpress still dominates with a market share of about 60 %. What has changed is that both Drupal and Joomla have shrunk, they were the largest of the small CMS-es in 2014 and are even smaller now.

Developments that I find alarming, but pretty much the entire industry seems excited about, are the introductions of headless (or decoupled) CMS-es and of services. In this future, a website is a container that collects and presents data from several sources through standardised APIs, with the CMS being one of those sources. The web itself becomes an app delivery platform in this scenario, and the choice for a CMS becomes less about “what do I want this site to look like?” and more about “how do I want content to be managed?” The word ‘app’ did not appear by accident in Wordpress’ 2018 tagline.

I have not been able to find any evidence of significantly popular commercial CMS-es in 2018. Which probably means that the ones that exist only serve the high-end market.

Update 5 January 2018: Wordpress has released version 5 of its system which includes the so-called Gutenberg editor, one of the largest changes in the CMS’ history. Why risk alienating your entire user-base by introducing a costly disruption nobody asked for and nobody needs? Because at the bottom of the market, hosted CMSes are busy nibbling at Wordpress’ feet. And even though these CMSes still only make up a tiny part of the market (too small for me to mention half a year ago), this development has got Mullenweg and pals scared. So yeah, speaking of developments, hosted CMSes like Wix and Squarespace might pop up on my next report, 3.5 years from now.

An anecdotal look at Facebook page reach

Here is one for the books.

This is a graph of the so-called ‘reach’ that my roller derby photography page has on Facebook. Reach means: how many people have been exposed to my photos. (In an earlier blog post I explained why I have a Facebook page in the first place.)

Every dot represents the reach of a post in which I introduce a new photo album to this page.

There are two things that stand out from this graph, both which are remarkable for reasons I will explain below.

The first thing you will notice is the up-and-down nature of the graph. One time I reach a lot of people and the other barely any at all, but — and here comes the second anomaly — before the autumn of 2017, “barely any at all” still meant more than 2,000 people reached. Since late 2017, reach has dropped into the hundreds.

This is strange because of the way I work. I visit roller derby games on the weekend, prepare an album containing photos of a game the day after and then post the album, containing a few dozen photos, to my Facebook page. Usually the players and sometimes their friends and family will look at the fresh album and that is that. After a week, nobody except a few fans of very popular players, engages with the album any more.

In other words, my status updates are limited to the same type of thing over and over again, and although the specific audience changes per game, the expected size of the audience is always the same — namely friends and family of two teams of skaters.* This should be reflected in my reach, but it isn’t.

If anything, my average reach should increase slightly because more and more people ‘like’ my page.

When my reach was still in the thousands, I wasn’t overly concerned about the up-and-down nature of it, because I was still reaching most of the people who would be interested in my photos. When it dropped into the hundreds however, I started to worry a little.

“Over the past few months, I’ve read articles and answered questions from many people who are concerned about declines in organic reach for their Facebook Pages”, wrote one Facebook employee in 2014 in an article titled Organic Reach on Facebook: Your Questions Answered. Let’s see what he had to say.

There are basically three reasons why reach would drop over the course of time. The first is that more and more updates are being shared each day, the second is that people ‘like’ more pages than they did before and the third is that Facebook won’t show everything.

In other words, whenever I share a photo album, Facebook shows it to less and less people over the course of time, and the more people like my page, the fewer people get to see its fruits.

Facebook does this, it claims, so that it can keep people engaged. If people have too many things of little value to look at, they will get bored. So Facebook prefers to present people with content of high value.

And then he bowls that entire edifice over by saying that companies can buy views for their pages. So much for putting engaging, valuable content first.

I had an interesting experience last month. The online presence of roller derby in the Netherlands is largely concentrated on Facebook. Games are announced using Facebook event pages. After a game, I share links to my photo albums there, because not every skater is a fan of my page, but they may still be interested in photos of that particular event.

On this occasion, somebody posted a comment to my post on the event page. Usually this takes the form of “thank you” or “nice photos” but I like to check, in case somebody wants a photo removed or says something untoward. In this case, though, I could not view the comment because I could not view the post because Facebook had decided (I assume) that my post was not engaging enough for me.

I could see I had posted that link, because Facebook was still showing it among the three posts in the preview of its Discussion tab (event pages are divided into an About and a Discussion tab). And I could also see that somebody had commented, because Facebook notifies you of new comments. The site however just refused to show me either my post or the comment.

So that was an interesting bit of automated gaslighting. Smarter systems, designed to counter trolls, hide postings from other readers but not from the author, Facebook seemingly does it the other way around.

International ad agency Ogilvy (disclosure: I worked for them in a previous life) wrote a white paper in 2014 in which they outline the everlasting decline of Facebook page reach. Their recommendations are that 1) you focus on sub-sets of your audience so that you can better supply them with engaging stories rather than going for a one size fits all, and 2) that you return to Platform Neutral, e.g. your own website. If you want to control the discussion, you have to control the platform.

I am not sure that is such a good advice, because Google Search is a platform too now (it wasn’t, or not as much, in 2014) and is capturing a lot of visitors before they can reach your site. Also, in the case of the amateur event photographer, Facebook may simply be where your audience is, and you don’t get to move them around.

*) Full disclosure: most events I photograph are so-called double headers, in which two roller derby games are played back-to-back. That means that in those cases my audience actually consists of the players, friends and family of four teams. However, that would have side-tracked you into contemplating the nature of roller derby events in a way that is completely irrelevant for this post, hence the condensation of the situation into a form that is easier to understand.

Facebook Location Spam


If you check in at a location on Facebook or enter the location for a photo, there is a chance that you will end up linking to spam.

The main reason for this is that Facebook is crap and the people who make Facebook are idiots, but I say this in anger after hacking spam out of my photo albums for 2 hours straight, so I will acknowledge that this is perhaps not the most constructive of explanations. Let me elucidate.

When you try and enter a location in Facebook, the site helpfully offers you a number of suggestions based on the part of the location name you have entered so far. This is not an exhaustive list, i.e. Facebook makes a selection of locations it is going to suggest. If the name of the location is not in the list, you get the option to ‘Just use’ the name you just entered.

In some of Facebooks forms, you get the option to Add Place. This takes you to a new form in which you can enter some information about the place you just added, including its address. Facebook does not remember what you added last time, so if you have to fix hundreds of photos, you have to fill out thousands of fields (hence me just wasting two hours).

But suppose you are a spamming low-life piece of scum (watch your contaminations, Branko!) and you have somehow managed to automate part of this process, you now have found yourself a way to storm the top of the list of location suggestions. At least, that is how I assume this works. It would make little sense for Facebook to suggest obscure locations, so I assume they automatically suggest popular locations, opening them up to attacks by spammers who have the time, the energy and the tools to game this system.

Presumably, the more people like and check in at these scam locations, the more popular these false locations get.

The screenshot illustrates how I have started typing ‘Sporthal’ – Dutch for sports venue – and as you see, Facebook suggests 8 locations. Of those, 3 have been hijacked by spammers, all of which show up in the top 4 (you can tell by the fact they share the same logo).

I have no idea how these scammers manage to hijack locations so completely. They take over both the profile photo and the cover photo and manage to be the only ones to have posting rights. The cover photo seems to be something that a person can suggest for a location, but the other two items aren’t.

I know of at least one location (Sporthal Oranjeplein in The Hague) where there was a somewhat well used, somewhat maintained real location page that was then ‘merged’ with the spam location. Meaning, if you somehow managed to find a link to the original location page and clicked it, Facebook would automatically redirect you to the spam page. In those cases Facebook will helpfully tell you it has merged pages and offer you a way to report an incorrect merge.

This is also useful in cases where locations have been merged with automatically created pages – case in point, links in photo albums leading to Utrecht Disaster (a roller skating hall) now all lead to an auto-generated page about the Heysel Stadium disaster. You can report the mismerge – as useful as pressing a pedestrian crossing call button I imagine.

So what is the problem? Is there a problem? I mean, I hate spammers and all that, but in the end it is my choice to add a location to my photos, and it is my fault if I don’t properly look at the location I add.

The mismerges are problematic in this respect, because I could link to a proper location only to find out years later that the link is now redirecting to spam.

I also imagine that if locations can be hijacked by spammers, they can be hijacked by phishers and other criminals with more insidious designs.

I don’t know of a way to fix this. Facebook does not want to hire people to add and manage locations, so this is always going to be a problem. It could disable locations altogether, but having people share where they have been and what they have done together, happens to be one of its most attractive qualities. Adding the ability to report spam, assuming Facebook would actually follow up on such reports, might help, but I can think of several drawbacks. For one, Facebook (and similar social media services) is known for selectively listening to its users. Why would I report something if I believe they wont listen anyway. The other problem is that this turns the whole battle over locations in one between two powerful factions (Facebook on the one hand, spammers on the other) in which the regular user is less and less likely to be heard.

Facebook’s problem is a conceptual one. It wants locations to be somewhat community managed, but ignores the fact that the community contains many bad actors.

There is a very simple thing they could have done for my specific problem, though. As I am typing the name of the venue where I have taken my photos, progressively less and less suggestions appear. This makes sense in a world where there is only one location called Sporthal Oranjeplein (staying with my previous example), but Facebook knows of several. Would it be too confusing to show more than one?