Google displays incorrect dates from news sites

I first became aware of the fact that Google displays dates in the search results after reading a blog post (in Swedish) by Simon Sundén. He also described how Google sometimes misinterprets the date an article or blog post is published. For example, this article was published on Newsmill in February 2009, but Google thinks it was published in December 1999 (see screen shot at Sundén’s blog) because it has the date 18 Dec, 1999 in the headline.

But there may be more to this story. Today I found that Google was displaying search results with the date 27 May, 2010 on articles that were in some cases several years old. Here are a few examples from Swedish dailies online.

– Dagens Nyheter, 29 Oct, 2003 – “Aftonbladet driver populismens journalistik”

– Aftonbladet, 8 Feb, 2007 – “Här är Bloggsverige!”

– Aftonbladet, 7 Oct, 2008 – “Välstajlad profilbild avslöjar dig”

– Aftonbladet, 3 Dec, 2008 – “Moderaterna ense efter krismöte”

– Ålandstidningen, 9 Dec, 2009 – “Zandra lämnar Xit – blir nöjesreporter på Aftonbladet”

– Expressen, 10 Dec, 2009 – “Moderaterna backar i ny mätning”

But Google thinks all these articles were published yesterday, 27 May, 2010. A few screen shots below:

ab-wendela-hans

dn-ab

alandstidningen

The immediate effect of this is that search results that aren’t very relevant to you may end up being ranked extremely high in the search results in Google. The article in Aftonbladet about my blog survey “Bloggsverige” is ranked #4 in Google on a search for Bloggsverige, when I know that previously it has not shown up in the top results.

It is also quite possible, as Simon Sundén also concludes, that it may be possible to game the system by fooling Google into thinking your blog post or article has been published more recently than it actually has.

I still haven’t quite sorted out exactly why Google misinterprets the dates of the articles listed above, but one thing is clear. All these articles have a more recent date in the code at one place or another, probably all of them have 28 May or 27 May 2010 somewhere. Once I or someone else figures this out, I will update this post. I would also like to know if this flaw is something that mostly benefits major news sites like the ones listed above.

Update:  James Royal-Lawson and I discussed this matter briefly on Twitter this evening and James posted his thoughts a few minutes ago. His conclusion is that Google takes the first date it finds, or at least the first date it finds reliable, and uses it to determine when the article has been pulblished. Since many online dailies have a number of different dates for different parts of each page, Google misinterprets the publication date. And if I look at for example the article in Dagens Nyheter above, from way back in 2003, that is exactly the case. The date 28 May, 2010 comes a few hundred lines of code before the actual publication date.

Update 2: Some more info here from Michael Gray.