Tuesday, May 24, 2011

Google Shuts Down Ambitious Newspaper Scanning Project

Jill Hurst-Wahl reported that Google is shutting down one of its digitization efforts.  In a statement to Search Engine Land, a Google spokesperson said:
Users can continue to search digitized newspapers at http://news.google.com/archivesearch, but we don’t plan to introduce any further features or functionality to the Google News Archives and we are no longer accepting new microfilm or digital files for processing.
Google's efforts were in partnership with several North American newspapers, ProQuest and Heritage Microfilm, according to a 2008 news report.

In reporting on Google's decision, the Boston Phoenix wrote:
News Archive was generally a good deal for newspapers -- especially smaller ones like ours, who couldn't afford the tens or hundreds of thousands of dollars it would have cost to digitally scan and index our archives -- and a decent bet for Google. It threaded a loophole for newspapers, who, in putting pre-internet archives online, generally would have had to sort out tricky rights issues with freelancers -- but were thought to have escaped those obligations due to the method with which Google posted the archives. (Instead of posting the articles as pure text, Google posted searchable image files of the actual newspaper pages.) Google reportedly used its Maps technology to decipher the scrawl of ancient newsprint and microfilm; but newspapers are infamously more difficult to index than books, thanks to layout complexities such as columns and jumps, which require humans or intense algorithmic juju to decode. Here's two wild guesses: the process may have turned out to be harder than Google anticipated. Or it may have turned out that the resulting pages drew far fewer eyeballs than anyone expected.
The lesson is that jumping on the Google bandwagon can be good thing, if the wagon keeps on moving. A lesson that those involved in Microsoft's book digitization program also learned the hard way.

Monday, May 9, 2011

Knowledge Management VS Information Management

Dr. Sue wrote a new post "Is knowledge management really information management?: a question of crucial definition" on her blog digitalcollaboration. I thought to share. Dr. Sue wrote:

No, I am not going to repeat the argument so well put forward many years ago by Tom Wilson (The nonsense of knowledge management, 2002, http://informationr.net/ir/8-1/paper144.html), with which I largely agree.  While Professor Wilson argues his case well, he largely comes to the conclusion that the term ‘knowledge management’ was formulated in order to cover a number of organisational managerial and communication issues, without much of a nod to – or even recognition of – the existing field of Library and Information Science, or Information Studies, or Information Studies, or whatever you want to call it.  This poverty of nomenclature – the continuing disregard that we information professionals seem to have to clarity of expression – is at the heart, I believe, of many of the perennial issues and problems that fracture our field to no real purpose.
Wilson has, from time to time, referred back to ‘knowledge management’, reinforcing his point that, as a practice or field of study, it doesn’t really exist as a separate entity, as it is identical in process and conception to information management.  What would help his argument enormously, I believe, is if he were able to use definitions for these terms (‘information’ and ‘knowledge’) that had achieved consensus in the field.  Then, we would not have to explain to all of those involved in this field, many of whom are drawn from management, information systems, business studies, technology and so forth – exactly what it is that needs to be done in order to manage ‘knowledge’.  We could perhaps even encourage these folk to take a look at the masses of research already completed in our field concerning precisely the issues with which knowledge managers now engage: assisting in the communication of ideas from one human to another.  As I have written elsewhere (e.g. 2005 and 2007), I understand information professionals to be ‘information interventionists’: we intervene in the knowledge creation cycle.
The central issue, though, is that we importantly have not yet come to a widely accepted definition of ‘information’ or ‘knowledge’.  By this I mean, rather more precisely, that we do not have an operational definition that works for our field and for the work we do.  James Gleick, author of Chaos, inter alia, has now published a book on information: ‘Information: a history, a theory, a flood‘ (Fourth Estate, 2011) and one must admire him for his courage and ability to do so.  Having said that, he does not move us forward to understand better what ‘information’ is.  Neither does philosopher Luciano Floridi, who has written extensively on this topic and on the philosophy of information.  However much the data-information-knowledge model (often represented in pyramid form) is criticised or maligned, this still remains the starting point, or mental model, for both authors.  In Gleick’s case, the concept is further confused with information objects or entities, technology, networks and the new physics.  I find the understanding of information in the new physics fascinating: Information: the new language of science is probably my favourite book on this subject.  But this does not conceptualise the notion of  ’information’ in a way that is meaningful for those of us who wish to assist people to create their own knowledge by finding out what others have thought, created, felt, experienced and so on.
This is why I wrote a PhD thesis on the topic of defining information. What I found in my research, amongst many other interesting things, is the political nature of the definition and interpretation of information, and I believe it would be appropriate for us to pay more attention to such dimensions of the core of our discipline/profession.

Sunday, May 8, 2011

A checklist to Maximise the Effectiveness of Online Resource

The following is the succinct checklist of items raised in the series of workshops commissioned by Strategic Content Alliance (SCA) during 2010 under the title Maximising Online Resource Effectiveness (MORE). The purpose was to promote most effective use of the internet by SCA member organisations, with an emphasis on promoting and communicating content available as an online resource. The workshops were delivered to around 300 participants from all over the UK, from a variety of further (FE) and higher education (HE) and public sector organisations.  

  1. Recognise advantages in having well prepared scalable content that can be utilised in “more is better” scenarios.
  2. Understand the potential of audience engagement using the web.
  3. Consider the longer term benefits of having computable content.
  4. Search engine providers want their users to find exactly what they are looking for, so describe your content accurately using titles, description, keywords.
  5. Use text for all important content.
  6. Monitor and measure how your site is being used and define success.
  7. Search engine optimisation cannot be ignored, but it is not everything.
  8. Know your audience.
  9. New standards enhance the value of content by enabling informative structure.
  10. Many benefits of new standards can be realised with older browsers by referencing ready made non intrusive javascript.
  11. RSS can extend the reach of suitable web site content.
  12. Keep things simple on a web page to prevent creating barriers to accessibility.
  13. Employ simple web based services to check the integrity of content and associated keywords.
  14. Embedded metadata can open up new possibilities for the use of content.
  15. Online social media can play a prominent role in attracting and engaging an audience.
  16. The social web is not new, it’s what the web was always intended to become.
  17. Audiences are already in the social web—it’s the best place to engage with them.
  18. Decide on a purpose for adopting the social web and use the best service for that.
  19. Be aware of valid organisational concerns over the use of the social web.
  20. A mix of expertise is required to maximise effectiveness.
  21. This expertise should be associated with different roles and responsibilities.
  22. The coordination of these roles should be an essential part of the web strategy of an organisation.
  23. A policy can allay concerns over the use of the social web by an organisation.
  24. There is no magic formula for a organisation wide social web policy—it depends on many organisation specific factors.
  25. The process of compiling a policy should involve a probing review that brings focus to benefits and workable processes.
  26. RDF is a very basic scheme for describing things using unambiguous terms in brief statements known as triples.
  27. RDFa is a way of including RDF in an ordinary web page to embed metadata.
  28. RDF metadata in a web site can be used by software applications to detect semantics.
  29. Web sites containing RDF can be linked when there is overlap in the triples.
  30. There is a growing number of online resources using RDF and real semantics to create a more effective web of resources.