Digital libraries and World Wide Web sites and page persistence.

Web pages and Web sites, some argue, can either be collected as elements of digital or hybrid libraries, or, as others would have it, the WWW is itself a library. We begin with the assumption that Web pages and Web sites can be collected and categorized. The paper explores the proposition that the W...

Full description

Bibliographic Details
Main Author: Wallace Koehler
Format: Article
Language:English
Published: University of Borås 1999-01-01
Series:Information Research: An International Electronic Journal
Subjects:
Online Access:http://informationr.net/ir/4-4/paper60.html
_version_ 1818590897265180672
author Wallace Koehler
author_facet Wallace Koehler
author_sort Wallace Koehler
collection DOAJ
description Web pages and Web sites, some argue, can either be collected as elements of digital or hybrid libraries, or, as others would have it, the WWW is itself a library. We begin with the assumption that Web pages and Web sites can be collected and categorized. The paper explores the proposition that the WWW constitutes a library. We conclude that the Web is not a digital library. However, its component parts can be aggregated and included as parts of digital library collections. These, in turn, can be incorporated into "hybrid libraries." These are libraries with both traditional and digital collections. Material on the Web can be organized and managed. Native documents can be collected <i>in situ, </i>disseminated, distributed, catalogueed, indexed, controlled, in traditional library fashion. The Web therefore is not a library, but material for library collections is selected from the Web. That said, the Web and its component parts are dynamic. Web documents undergo two kinds of change. The first type, the type addressed in this paper, is &quot;persistence&quot; or the existence or disappearance of Web pages and sites, or in a word the lifecycle of Web documents. &quot;Intermittence&quot; is a variant of persistence, and is defined as the disappearance but reappearance of Web documents. At any given time, about five percent of Web pages are intermittent, which is to say they are gone but will return. Over time a Web collection erodes. Based on a 120-week longitudinal study of a sample of Web documents, it appears that the half-life of a Web page is somewhat less than two years and the half-life of a Web site is somewhat more than two years. That is to say, an unweeded Web document collection created two years ago would contain the same number of URLs, but only half of those URLs point to content. The second type of change Web documents experience is change in Web page or Web site content. Again based on the Web document samples, very nearly all Web pages and sites undergo some form of content within the period of a year. Some change content very rapidly while others do so infrequently (Koehler, 1999a). This paper examines how Web documents can be efficiently and effectively incorporated into library collections. This paper focuses on Web document lifecycles: persistence, attrition, and intermittence. While the frequency of content change has been reported (Koehler, 1999a), the degree to which those changes effect meaning and therefore the integrity of bibliographic representation is yet not fully understood. The dynamics of change sets Web libraries apart from the traditional library as well as many digital libraries. This paper seeks then to further our understanding of the Web page and Web site lifecycle. These patterns challenge the integrity and the usefulness of libraries with Web content. However, if these dynamics are understood, they can be controlled for or managed.
first_indexed 2024-12-16T10:03:50Z
format Article
id doaj.art-cccda1a602d247428997b113f40a38d5
institution Directory Open Access Journal
issn 1368-1613
language English
last_indexed 2024-12-16T10:03:50Z
publishDate 1999-01-01
publisher University of Borås
record_format Article
series Information Research: An International Electronic Journal
spelling doaj.art-cccda1a602d247428997b113f40a38d52022-12-21T22:35:44ZengUniversity of BoråsInformation Research: An International Electronic Journal1368-16131999-01-014460Digital libraries and World Wide Web sites and page persistence.Wallace KoehlerWeb pages and Web sites, some argue, can either be collected as elements of digital or hybrid libraries, or, as others would have it, the WWW is itself a library. We begin with the assumption that Web pages and Web sites can be collected and categorized. The paper explores the proposition that the WWW constitutes a library. We conclude that the Web is not a digital library. However, its component parts can be aggregated and included as parts of digital library collections. These, in turn, can be incorporated into "hybrid libraries." These are libraries with both traditional and digital collections. Material on the Web can be organized and managed. Native documents can be collected <i>in situ, </i>disseminated, distributed, catalogueed, indexed, controlled, in traditional library fashion. The Web therefore is not a library, but material for library collections is selected from the Web. That said, the Web and its component parts are dynamic. Web documents undergo two kinds of change. The first type, the type addressed in this paper, is &quot;persistence&quot; or the existence or disappearance of Web pages and sites, or in a word the lifecycle of Web documents. &quot;Intermittence&quot; is a variant of persistence, and is defined as the disappearance but reappearance of Web documents. At any given time, about five percent of Web pages are intermittent, which is to say they are gone but will return. Over time a Web collection erodes. Based on a 120-week longitudinal study of a sample of Web documents, it appears that the half-life of a Web page is somewhat less than two years and the half-life of a Web site is somewhat more than two years. That is to say, an unweeded Web document collection created two years ago would contain the same number of URLs, but only half of those URLs point to content. The second type of change Web documents experience is change in Web page or Web site content. Again based on the Web document samples, very nearly all Web pages and sites undergo some form of content within the period of a year. Some change content very rapidly while others do so infrequently (Koehler, 1999a). This paper examines how Web documents can be efficiently and effectively incorporated into library collections. This paper focuses on Web document lifecycles: persistence, attrition, and intermittence. While the frequency of content change has been reported (Koehler, 1999a), the degree to which those changes effect meaning and therefore the integrity of bibliographic representation is yet not fully understood. The dynamics of change sets Web libraries apart from the traditional library as well as many digital libraries. This paper seeks then to further our understanding of the Web page and Web site lifecycle. These patterns challenge the integrity and the usefulness of libraries with Web content. However, if these dynamics are understood, they can be controlled for or managed.http://informationr.net/ir/4-4/paper60.htmlDigital librariesWorld Wide Webpage persistencehybrid libraries
spellingShingle Wallace Koehler
Digital libraries and World Wide Web sites and page persistence.
Information Research: An International Electronic Journal
Digital libraries
World Wide Web
page persistence
hybrid libraries
title Digital libraries and World Wide Web sites and page persistence.
title_full Digital libraries and World Wide Web sites and page persistence.
title_fullStr Digital libraries and World Wide Web sites and page persistence.
title_full_unstemmed Digital libraries and World Wide Web sites and page persistence.
title_short Digital libraries and World Wide Web sites and page persistence.
title_sort digital libraries and world wide web sites and page persistence
topic Digital libraries
World Wide Web
page persistence
hybrid libraries
url http://informationr.net/ir/4-4/paper60.html
work_keys_str_mv AT wallacekoehler digitallibrariesandworldwidewebsitesandpagepersistence