I was recently asked by the State Library of Victoria (SLV) if I would consent to have Fitzroyalty archived by the Australian internet archive Pandora, which was established by the National Library of Australia (NLA). Other sites I read regularly, such as artist Hazel Dooney’s blog, have also recently been judged worthy of being archived. To be archived, site must demonstrate social, cultural or political significance and have long-term research value. A long term strategy to archive ephemeral electronic publications for the future will be of great value to the Australian community over time.

It’s a compliment to be invited, but before I agreed I wanted to investigate the implications of having Fitzroyalty archived in Pandora. While the archiving function seems self explanatory, Pandora also makes the archive publicly available online, effectively republishing the original site while the original site is live. I was unsure about this, and my investigation suggests that this could be problematic in many ways.

A widely available book borrowed from a library is an alternative to buying the book. Many people belong to libraries so they don’t have to spend lots of money buying books. In contrast, a rare, old or out of print book in a special collection with limited public access is no competition to anything because there are no other sources or copies available. This is the traditional library model for print publications, but I don’t think either of these scenarios can or should be applied to online publications. The internet is different.

What happens when an inferior static copy of a live dynamic site is republished in competition to the original? I don’t think anyone knows. What follows is a series of questions I sent to the SLV and NLA about how Pandora operates and their responses. I was distinctly unimpressed by their lack of understanding about the technical implications of their archives. It is evident that their technical and management policies are seriously outdated and inadequate.

SEO and duplicate content

I have worked hard on search engine optimisation (SEO) techniques to improve the search rank of my posts. I am also concerned about the detrimental effect that duplicate content has on search rank and have worked to minimise this. How does Pandora deal with the duplicate content issue?

Pandora response – no answer.

Search engine indexing

I note that the Pandora FAQ states that “Commercial search engines such as Yahoo! and Google also index the Archive to the title level.” Does this mean Google is not indexing individual posts in Pandora archives?

Pandora response – no answer.

External links

I am unsure if I can accept how Pandora alters hotlinks to external URLs in posts. The Pandora page that appears when an external link is clicked in an archived page saying that the site the following external link points to has not been indexed is annoying. The phrase “You may go to the original site while it remains at this location!” is not very helpful. It should perhaps state whether the external site is live or not, and provide a hotlink if it is.

Pandora response – no answer.

Internal links

I am fundamentally dissatisfied with the way Pandora rewrites internal links within a site it archives. If in post A I create a link to post B, I do so to provide relevant information to my readers and to generate positive SEO associations.

Most of my readers consume my content via RSS and Atom feeds in external platforms such as feed readers, not by visiting Fitzroyalty directly. The internal content links function to drive readers from my feeds to my site.

I could only accept Fitzroyalty being indexed by Pandora if internal links automatically went directly to the original site (without an annoying page like the external link one) while Pandora knows the original is live.

Pandora should only alter internal links to link to the archived version of a post instead of the original post if the original is no longer live.

Pandora response – no answer.


I sometimes update a post if relevant new information becomes available. Once Pandora indexes a single post does it reindex it again if it is later altered?

Pandora response – no answer.

Dynamic content

Does Pandora index from a feed or by a custom scrape? I ask this because Fitzroyalty contains a number of dynamic features designed to aid the discovery of relevant content.

The content in the right column is dynamically random – each time a post or page is loaded the selection presented to the reader is different. If this was made static in the archive it would not accurately represent the complete body of content.

At the bottom of individual posts I have a function that provides links to 5 other posts that are dynamically selected contextually in terms of the content of the post. This makes suggestions from the entire body of content, including newer posts than the post being read. This means that the selections provided will change as the collective body of content grows. Again, if this dynamic feature was made static it would distort the or limit access to the complete body of content.

Pandora response – no answer.


I am extremely interested in the concept of place, specifically current ideas about hyperlocal content and the geoweb. Pandora seems to have no concept of place. There is no community or location categories within which to place Fitzroyalty. Where in the current taxonomy would you place it?

Pandora response – no answer.

My reservations

These factors suggests to me that there are potentially significant threats to the findability of my site and my ability to use current marketing and SEO techniques to increase my audience if Fitzroyalty is indexed by Pandora.

Unlike other instances where the aggregation of my feeds in other sites provides numerous links back to my original site, Pandora functions to remove these links (thus removing their SEO benefits). Being indexed by Pandora could consequently reduce the size of my audience and reduce my ability to communicate directly with my audience.

I can see little benefit to consenting to this extreme form of aggregation that acts to isolate the archived copy from the live original source. The tradeoff for most online publishers in having their work aggregated elsewhere is that it increases their audience at the original site. I have a problem with this goal being made more difficult.

Pandora’s static archiving of pages would negate the dynamic nature of Fitzroyalty and provide an inferior body of content and reading experience for my audience. As a publisher with high standards I would not want to allow a lower quality copy of my work to be made available to the public.

Finally, I need to be able to measure the audience consuming my feeds and visiting my site. I cannot measure my audience on Pandora.

Pandora response – no answer.


I understand the importance of archiving publications and the complexities of archiving electronic publications. When I completed my PhD in 1998 I was the first student to insist the UWA Library catalogue and index the electronic version of my thesis that I published online on my own site (UWA at that time was not a member of the digital theses project and it could therefore not accept my thesis for indexing).

I appreciate that I have raised some complex technical and conceptual ideas related to how Pandora functions, and that it may not be easy for you to respond to all of them. I cannot accept the current terms you offer to have Fitzroyalty indexed in Pandora, but I am willing to work with you to come to more acceptable arrangements. In particular, I would be keen to have Fitzroyalty indexed for preservation purposes. It is the way in which you currently offer the archived copy of a site to the public that I find unacceptable.

Pandora response – a short acknowledgement of my refusal to give permission to have Fitzroyalty indexed in Pandora.

Why I have written this post

I have published this edited version of my correspondence with librarians at the SLV and NLA because I want our libraries and cultural institutions to serve the Australian community as efficiently as possible. I have identified significant shortcomings with their technical and management strategies for archiving live electronic publications, and I have done my best to advise them how they can modernise and improve Pandora.

Pandora fails to provide sufficient or appropriate incentives for electronic publishers to consent to having their publications archived. Pandora also fails to acknowledge or accept that there may be significant disincentives to having a site archived in Pandora for publishers and for audiences.

There’s a fundamental difference between publishing content via live dynamic RSS feeds, which contain internal links back to the original source of the content, and allowing an inferior static archive copy to be republished. The former is effective syndication. The latter could undermine your ability to communicate with your audience.

I also want to warn bloggers and online publishers about some of the potential consequences of consenting to having their sites indexed in Pandora. While the philosophy is admirable, the reality is an unacceptably poor way to represent dynamic live internet content.

Publishers have legitimate concerns about the accuracy and quality of the republished archive that Pandora refuses to address. Until Pandora addresses these important issues, I recommend that all online content publishers refuse to consent to have their sites indexed by Pandora.

The Pandora Australian internet archive – what are the costs and benefits of having your site archived?

2 thoughts on “The Pandora Australian internet archive – what are the costs and benefits of having your site archived?

  • 18 January 2009 at 1:19 am


    Really good points.

    It is surpisng though that an organisation like Pandora would not adhere to the basic rules of fair use on the internet. ie don’t change the content and if you use it make sure it links back to the original.


    • 18 January 2009 at 11:02 am

      Yes I was too, which is why I wanted to research and understand it. I think they simply don’t see the implications of publishing the archive. If they archived a site and then made it publically available only once the source ceased to exist there would be no problem.


Leave a Reply

Your email address will not be published. Required fields are marked *