Google’s featured snippets algorithm is quite smart, at times

As a I was trawling through the statistics for this site, I noticed how popular my post about creating rounded corners with The Gimp is.

I mean, I knew it was popular, it is the post with by far the most comments on this blog, but I did not know it outperformed other posts by an order of magnitude.

And one reason, it turns out, is that Google heaps mountains of love on that post for search phrases like “gimp rounded corners”. In fact, my post is presented as a featured snippet for that phrase (without the quotation marks), meaning that not only is it returned as the first search result, but it also receives a special presentation that makes it really stand out (see illustration).

[screenshot of Google search results]

But what I find most remarkable is how Google’s algorithms managed to extract a numbered list from the way I did it. Not so much that they translated “Step 1” to just “1”, but that they realised that labelling things “step” is one of a myriad of ways one can present a list.

[screenshot of my blogpost]

Also note how the algorithm automatically skipped my intro. (That is not just smart, that is wise.)

A customer once asked me if I could make a tool that would take a person’s resumé, regardless of how they had formatted it, and extract all relevant information from it in a structured manner. I said yes, I can do this, it is pretty much what I have trained to do in university, and an optimistic estimate for my time would be about 18 months.

This upset the customer no end, because they thought a week and a half, tops, was what was needed.

I hope you see the similarities between the two projects, and assuming I know my stuff, a lot of work must have gone into the Featured Snippets project just to make it feel like it works most of the time. Things can still go wrong with search snippets (says a Google blog post from January this year). People will inject their tutorials and explanations with their own personality and opinions, which on the whole must be a good thing, except that sometimes these personalities and opinions are of a rather unsavoury type. “How to torture puppies” is not a tutorial healthy people would like to see boosted by Google.

Google must have seen their Featured Snippets feature as just one way of organising search results in a manner that is most helpful to its users, but a ‘naive’ visitor may see Google giving certain results a pedestal, and that is a valid interpretation. (Certainly the SEO world appears to be eyeing search snippets with a greed that belies the mere interpretation of Featured Snippets as ‘superior presentation’.)

State of the CMS in 2018

After predicting in 2004 (without naming names) that tools like Wordpress, Drupal and Joomla might become popular CMSes … they did! In 2010 and 2014 I followed up with articles exploring which of these tools had become popular and how they described themselves over time respectively.

Re-reading these articles makes me realise how quaint the premise must seem to a modern audience. It is as if I predicted horses would be called horses in the future. What is so special about predicting the obvious? But even though in 2018 these tools appear to be the very definition of CMS-es, in 2006 they weren’t. If you Googled for CMS-es 12 years ago, you would get completely different names (none of which I remember as they sank into obscurity over time).

In those days you had, apart from ‘real’ CMS-es also forum software, blogging software, so-called ‘nukes’ (community software) and so on. Wordpress and Drupal were blogging tools back then, and Joomla a nuke.

So let us see what has changed.

Wordpress
2010: Semantic personal publishing platform
2014: Web software you can use to create a beautiful website or blog
2018: Open source software you can use to create a beautiful website, blog, or app

Drupal
2010: Open source content management system
2014: Open source content management platform
2018: Open source content management system

Joomla
2010: Dynamic portal engine and content management system
2014: Content management system
2018: Content management system

As you can see, barely anything has changed the last four years, the owners and developers see their tools as conceptually the same (even though the web has changed a lot in that time).

Something else that hasn’t changed much is popularity. Wordpress, Drupal and Joomla are still the outright leaders, and like four years ago, Wordpress still dominates with a market share of about 60 %. What has changed is that both Drupal and Joomla have shrunk, they were the largest of the small CMS-es in 2014 and are even smaller now.

Developments that I find alarming, but pretty much the entire industry seems excited about, are the introductions of headless (or decoupled) CMS-es and of services. In this future, a website is a container that collects and presents data from several sources through standardised APIs, with the CMS being one of those sources. The web itself becomes an app delivery platform in this scenario, and the choice for a CMS becomes less about “what do I want this site to look like?” and more about “how do I want content to be managed?” The word ‘app’ did not appear by accident in Wordpress’ 2018 tagline.

I have not been able to find any evidence of significantly popular commercial CMS-es in 2018. Which probably means that the ones that exist only serve the high-end market.

Update 5 January 2018: Wordpress has released version 5 of its system which includes the so-called Gutenberg editor, one of the largest changes in the CMS’ history. Why risk alienating your entire user-base by introducing a costly disruption nobody asked for and nobody needs? Because at the bottom of the market, hosted CMSes are busy nibbling at Wordpress’ feet. And even though these CMSes still only make up a tiny part of the market (too small for me to mention half a year ago), this development has got Mullenweg and pals scared. So yeah, speaking of developments, hosted CMSes like Wix and Squarespace might pop up on my next report, 3.5 years from now.

An anecdotal look at Facebook page reach

Here is one for the books.

This is a graph of the so-called ‘reach’ that my roller derby photography page has on Facebook. Reach means: how many people have been exposed to my photos. (In an earlier blog post I explained why I have a Facebook page in the first place.)

Every dot represents the reach of a post in which I introduce a new photo album to this page.

There are two things that stand out from this graph, both which are remarkable for reasons I will explain below.

The first thing you will notice is the up-and-down nature of the graph. One time I reach a lot of people and the other barely any at all, but — and here comes the second anomaly — before the autumn of 2017, “barely any at all” still meant more than 2,000 people reached. Since late 2017, reach has dropped into the hundreds.

This is strange because of the way I work. I visit roller derby games on the weekend, prepare an album containing photos of a game the day after and then post the album, containing a few dozen photos, to my Facebook page. Usually the players and sometimes their friends and family will look at the fresh album and that is that. After a week, nobody except a few fans of very popular players, engages with the album any more.

In other words, my status updates are limited to the same type of thing over and over again, and although the specific audience changes per game, the expected size of the audience is always the same — namely friends and family of two teams of skaters.* This should be reflected in my reach, but it isn’t.

If anything, my average reach should increase slightly because more and more people ‘like’ my page.

When my reach was still in the thousands, I wasn’t overly concerned about the up-and-down nature of it, because I was still reaching most of the people who would be interested in my photos. When it dropped into the hundreds however, I started to worry a little.

“Over the past few months, I’ve read articles and answered questions from many people who are concerned about declines in organic reach for their Facebook Pages”, wrote one Facebook employee in 2014 in an article titled Organic Reach on Facebook: Your Questions Answered. Let’s see what he had to say.

There are basically three reasons why reach would drop over the course of time. The first is that more and more updates are being shared each day, the second is that people ‘like’ more pages than they did before and the third is that Facebook won’t show everything.

In other words, whenever I share a photo album, Facebook shows it to less and less people over the course of time, and the more people like my page, the fewer people get to see its fruits.

Facebook does this, it claims, so that it can keep people engaged. If people have too many things of little value to look at, they will get bored. So Facebook prefers to present people with content of high value.

And then he bowls that entire edifice over by saying that companies can buy views for their pages. So much for putting engaging, valuable content first.

I had an interesting experience last month. The online presence of roller derby in the Netherlands is largely concentrated on Facebook. Games are announced using Facebook event pages. After a game, I share links to my photo albums there, because not every skater is a fan of my page, but they may still be interested in photos of that particular event.

On this occasion, somebody posted a comment to my post on the event page. Usually this takes the form of “thank you” or “nice photos” but I like to check, in case somebody wants a photo removed or says something untoward. In this case, though, I could not view the comment because I could not view the post because Facebook had decided (I assume) that my post was not engaging enough for me.

I could see I had posted that link, because Facebook was still showing it among the three posts in the preview of its Discussion tab (event pages are divided into an About and a Discussion tab). And I could also see that somebody had commented, because Facebook notifies you of new comments. The site however just refused to show me either my post or the comment.

So that was an interesting bit of automated gaslighting. Smarter systems, designed to counter trolls, hide postings from other readers but not from the author, Facebook seemingly does it the other way around.

International ad agency Ogilvy (disclosure: I worked for them in a previous life) wrote a white paper in 2014 in which they outline the everlasting decline of Facebook page reach. Their recommendations are that 1) you focus on sub-sets of your audience so that you can better supply them with engaging stories rather than going for a one size fits all, and 2) that you return to Platform Neutral, e.g. your own website. If you want to control the discussion, you have to control the platform.

I am not sure that is such a good advice, because Google Search is a platform too now (it wasn’t, or not as much, in 2014) and is capturing a lot of visitors before they can reach your site. Also, in the case of the amateur event photographer, Facebook may simply be where your audience is, and you don’t get to move them around.

*) Full disclosure: most events I photograph are so-called double headers, in which two roller derby games are played back-to-back. That means that in those cases my audience actually consists of the players, friends and family of four teams. However, that would have side-tracked you into contemplating the nature of roller derby events in a way that is completely irrelevant for this post, hence the condensation of the situation into a form that is easier to understand.

Facebook Location Spam

facebook-location-spam

If you check in at a location on Facebook or enter the location for a photo, there is a chance that you will end up linking to spam.

The main reason for this is that Facebook is crap and the people who make Facebook are idiots, but I say this in anger after hacking spam out of my photo albums for 2 hours straight, so I will acknowledge that this is perhaps not the most constructive of explanations. Let me elucidate.

When you try and enter a location in Facebook, the site helpfully offers you a number of suggestions based on the part of the location name you have entered so far. This is not an exhaustive list, i.e. Facebook makes a selection of locations it is going to suggest. If the name of the location is not in the list, you get the option to ‘Just use’ the name you just entered.

In some of Facebooks forms, you get the option to Add Place. This takes you to a new form in which you can enter some information about the place you just added, including its address. Facebook does not remember what you added last time, so if you have to fix hundreds of photos, you have to fill out thousands of fields (hence me just wasting two hours).

But suppose you are a spamming low-life piece of scum (watch your contaminations, Branko!) and you have somehow managed to automate part of this process, you now have found yourself a way to storm the top of the list of location suggestions. At least, that is how I assume this works. It would make little sense for Facebook to suggest obscure locations, so I assume they automatically suggest popular locations, opening them up to attacks by spammers who have the time, the energy and the tools to game this system.

Presumably, the more people like and check in at these scam locations, the more popular these false locations get.

The screenshot illustrates how I have started typing ‘Sporthal’ – Dutch for sports venue – and as you see, Facebook suggests 8 locations. Of those, 3 have been hijacked by spammers, all of which show up in the top 4 (you can tell by the fact they share the same logo).

I have no idea how these scammers manage to hijack locations so completely. They take over both the profile photo and the cover photo and manage to be the only ones to have posting rights. The cover photo seems to be something that a person can suggest for a location, but the other two items aren’t.

I know of at least one location (Sporthal Oranjeplein in The Hague) where there was a somewhat well used, somewhat maintained real location page that was then ‘merged’ with the spam location. Meaning, if you somehow managed to find a link to the original location page and clicked it, Facebook would automatically redirect you to the spam page. In those cases Facebook will helpfully tell you it has merged pages and offer you a way to report an incorrect merge.

This is also useful in cases where locations have been merged with automatically created pages – case in point, links in photo albums leading to Utrecht Disaster (a roller skating hall) now all lead to an auto-generated page about the Heysel Stadium disaster. You can report the mismerge – as useful as pressing a pedestrian crossing call button I imagine.

So what is the problem? Is there a problem? I mean, I hate spammers and all that, but in the end it is my choice to add a location to my photos, and it is my fault if I don’t properly look at the location I add.

The mismerges are problematic in this respect, because I could link to a proper location only to find out years later that the link is now redirecting to spam.

I also imagine that if locations can be hijacked by spammers, they can be hijacked by phishers and other criminals with more insidious designs.

I don’t know of a way to fix this. Facebook does not want to hire people to add and manage locations, so this is always going to be a problem. It could disable locations altogether, but having people share where they have been and what they have done together, happens to be one of its most attractive qualities. Adding the ability to report spam, assuming Facebook would actually follow up on such reports, might help, but I can think of several drawbacks. For one, Facebook (and similar social media services) is known for selectively listening to its users. Why would I report something if I believe they wont listen anyway. The other problem is that this turns the whole battle over locations in one between two powerful factions (Facebook on the one hand, spammers on the other) in which the regular user is less and less likely to be heard.

Facebook’s problem is a conceptual one. It wants locations to be somewhat community managed, but ignores the fact that the community contains many bad actors.

There is a very simple thing they could have done for my specific problem, though. As I am typing the name of the venue where I have taken my photos, progressively less and less suggestions appear. This makes sense in a world where there is only one location called Sporthal Oranjeplein (staying with my previous example), but Facebook knows of several. Would it be too confusing to show more than one?

Design pattern: event calendar (focussing on WordPress)

Event calendars tell users about interesting events that are about to happen. They can also help create an impression of how busy the near future will be. Furthermore, calendars may double as a navigation or filter tool.

Events as blog posts in WordPress

I’ve helped build a number of event calendars for websites in the past, especially for websites based on the WordPress-CMS. For small businesses and organisations who mainly need a website for informational purposes, WordPress is a powerful choice because it is cheap, easy to install, easy to maintain and well supported.

A basic WordPress-based website shows information as a series of blog post abstracts on its homepage, the most recent one at the top and posts getting progressively older as the visitor scrolls down the web page.

A simple way to draw attention to events is to display them as blog posts. WordPress started out as a blogging platform so it’s well suited for this purpose. There are a number of problems with this approach:

  • Events don’t necessarily mix well with regular blog posts or news items.
  • Regular blog posts are best sorted by publication date, events are best sorted by event date.
  • If you wrote about an event early on, it would get pushed off the screen by more recent posts.

In short, people would have to start hunting for your events or your news or both. For that reason it is best if events and blog posts are separated. This is where event calendars come in.

Luckily WordPress offers a lot of plugins for event calendars. Searching for these plugins in the WordPress plugin directory yielded the following number of hits per search phrase: events (1,001), event calendar (314), event list (841) and so on.

Grid type event calendars

If you look at the screenshots from the top results for each search, you will see that most of the event calendars are displayed as classical calendars, that is to say a matrix in which each column presents a weekday and each row a week.

event-wordpress-plugins

Read the rest of this entry »

Ads for something you’ve already bought

Lately this happens a lot to me:
1) I search the web for a product.
2) I settle on product X.
3) The ad network remembers my choice.
4) I buy product X.
5) The next two weeks, the web inundates me with ads for product X, even though I have already been sated with said product.

In other words, I keep seeing ads on the web for products I’ve already either bought or rejected.

The mechanism behind this is called targeted advertising. Basically you visit website A which tells ad network Annoy Inc. what you’ve been looking at, then you visit website B which loads ads by Annoy Inc. based on what they know about your interests.

Apparently I am a little bit behind the curve, because this sort of thing was already happening in 2012. The Slate article calls the practice creepy and focusses on the fact that the advertisements follow you around without actually serving a purpose. I’d probably use a less strong word and call it strange rather than creepy, but then I don’t need to draw in many readers in order to serve them targeted ads, like Slate does.

It seems to be that advertising has become smart enough to realise what you are interested in at any given point, but not smart enough to realise when that interest drops abruptly or changes in nature. The funny thing is that advertising for something that you are no longer interested in is actually worse than advertising for something you have never been interested in. It’s a bit like the one night stand from two weeks ago showing up at work five times a day to nag you about wanting to do the sex thing again – well, at least they have a chance you will say yes.

Why are companies so stupid? I think part of the problem may be that ad networks really don’t have an incentive to change things. They get paid by the view and can in fact prove that you’ve shown interest in the product that’s being advertised. If manufacturers and sellers want to stop annoying their core customer base, maybe they should get involved more into on-line advertising. (Or maybe the companies really aren’t that stupid and get something out of it that the consumers have yet to suss out.)

See also:

Default browser cookie settings in 2014

(TL/DR? Skip to results.)

Yesterday I wrote that even though social networks currently combine targeted advertising and private user data collection, doing them both is not a requirement for running a profitable social network. The networks can just focus on the former, that is focus on the harvesting and selling of user data, and dispose of the advertising part altogether.

Having the social network and the ad network on the same domain (for example facebook.com) does make things slightly easier for the social network operator, because users may have switched off so-called third party cookies which are stored and read from a different domain (for example doubleclick.com).

The reason why the average user would block third-party cookies is because these cookies are almost exclusively abused for tracking users behind their backs.

How much of a problem is it to advertisers that users block third-party cookies? Not much. Users are typically reluctant to tinker with browser settings, therefore it depends on the web browser makers and the sensible defaults they choose whether an aspiring social network can plant cookies that another domain may read.

I decided to look into the defaults of modern web browsers, but could not find much information.

Here are some data points:

That leaves some browsers unexplored. Since checking the browsers on my computer was probably going to be easier than Googling anyway, I decided to take that route.

Table: default cookie settings for some web browsers in 2014.
Browser + version Operating system Default cookie setting
Google Chrome 37 Microsoft Windows Allow (all?) cookies
Microsoft Internet Explorer 11 Microsoft Windows Allow some third-party cookies
Mozilla Firefox 32 Microsoft Windows Allow third-party cookies
Apple Safari Apple iOS 7 Allow local cookies?
Android browser Google Android 4.0 Allow (all?) cookies?

As you can see the answers are ambiguous at times and don’t square with the results I linked to, but it would appear that currently most web browser will let sites track you across domains using third-party cookies.

A note about methodology. This was a quick study to find out what the default cookie settings are. For that, I needed to restore browser defaults and that was not always possible. The mobile devices (iOS and Android) had no way to restore settings to a default so I had to assume that these were the default settings.

I do tinker with my desktop browsers but I rarely do so with my mobile devices, so it’s a reasonable guess that the aforementioned settings are the default ones, I just cannot be absolutely sure.

Another problem was that browser manufacturers use different settings, different terminology and sometimes translations which can make it hard to find out which is which.

Most browsers speak of ‘allowing’ cookies, iOS Safari speaks of blocking them.

The reason I report Chrome’s default as “allow (all?) cookies” rather than “allow all cookies” is because I don’t know if “indirecte cookies” is their Dutch translation of “third-party cookies”. If it is, you can remove the question mark and conclude that Chrome allows all cookies by default.

Internet Explorer has a return-to-default button just for privacy settings, which is much appreciated, and a number of sensible settings collections. Unfortunately the explanation of what these settings mean is rather opaque. For instance I don’t know what are “cookies that can be used to contact you”.

Firefox’ default is also a ‘sensible’ setting which tells you only in the most general terms what it does, namely that the browser “will remember your browsing, download, form and search history, and keep cookies from websites you visit”.

You can choose to use custom settings and if the defaults for these settings can be assumed to be the same as the ‘sensible’ settings, then their third-party policy is clear if perhaps not sensible: “Accept third-party cookies? Always.”

Safari lets you choose to block cookies: “Always”, “From third parties and advertisers” and “Never”. I assume “and advertisers” is not a separate category from “third parties” and was just inserted to make it clear that these are tracking cookies, but again, that’s just an assumption.

The Android Browser’s setting is the least complicated of all, you can choose Cookies or No cookies, and if you choose the latter I assume most of the useful services on the web become off limits to you. But are there really people who bank online using their smart phone and an operating system made by Google?

If browsers all blocked third-party cookies, you still wouldn’t be safe though. For one thing, what we generally understand as cookies, small bits of data that are written and read using two standard Javascript functions, only make up a small part of all the different types of tracking technologies there are.

Dealing with the Dutch cookie law as a web developer

This note about how to comply with the Dutch cookie law is mostly a memo to self, but I believe the information past the fold is also useful to anyone who runs their own website and needs to ensure the privacy of their site’s visitors.

Read the rest of this entry »

Notes from the Responsive Design trenches

Lately a lot of companies have been asking for websites built along the principles of ‘Responsive Design’. I had to give up on building a responsive website in early 2012 due to lack of time, but in January 2013 I got another chance. (Side-note: both websites are on intranets, so I cannot show them to you.)

Responsive Design is designing a website in such a way that it rearranges itself to look good both on large screens (typically desktop-PCs) and small screens (typically mobile phones).

The text below is first and foremost a memo to self, but it can also be used as an addition to the ultimate Responsive Design primer, the A List Apart article by Ethan Marcotte that started it all. I will explain Responsive Design in a bit more detail below, but if you really want to know what it is about I suggest you read the A List Apart piece.

Although Responsive Design is pretty straightforward to anybody who has done even the most trivial things with Cascading Style Sheets, it is typically used in a wider context that can make things complicated. Hence the need for this intermediate level article.

Read the rest of this entry »

The LinkedIn endorsement system

LinkedIn has introduced an endorsement system which lets you ‘endorse’ the skills of your connections.

A few quick notes about this:

  • I haven’t checked whether these are skills you entered yourself; that seems to be the case though.
  • I have endorsed wide, easy skills, such as mastering your native tongue.
  • I have endorsed specialized skills that I have witnessed myself, or that are somehow at the core of that connection’s abilities.
  • I have not endorsed skills that sound like a core skill, but that to my knowledge aren’t; for example, if I know a project manager, I am not going to endorse them as change manager, even if I have seen them manage changes after the delivery of a project. Similarly, I have not endorsed interaction designers as user experience experts.
  • In other words, don’t be shy to add the simple stuff to your profile.
  • Also add skills that you know your connections know you possess.
  • So far I have been honest and have only endorsed skills that I knew people possessed.
    I expect some people will just endorse all the skills of their friends or connections.
  • With this system Recommendations are probably going to be more rather than less important, considering my previous note.