Archives on the Web
First Monday

Archives on the Web: Unlocking Collections While Safeguarding Privacy by Sara S. Hodson

Privacy has become an increasingly worrisome issue in modern life. Technological innovations, especially the Internet, expand and enhance the manner and extent by which private or sensitive information is collected and shared, leaving all of us more vulnerable to the invasion of our privacy. As archival information, both original materials themselves and the descriptions of these materials, increasingly appears on the Internet, archivists and information specialists must ensure that they do not inadvertently or thoughtlessly make available private, sensitive or proprietary information.



Justice Potter Stewart famously remarked about pornography that he didn’t know how to define it, but he knew it when he saw it. Similarly, we may not know precisely what constitutes privacy, but we think we know about our own privacy, and we definitely feel we know if it has been violated. In fact, though, there are formal definitions of privacy, formulated by lawyers and legal experts.

Contrary to what we might think, the word “privacy” does not occur in the U.S. Constitution. The first assertion of the right to privacy came from Samuel D. Warren and Louis D. Brandeis, who wrote an article called “The Right to Privacy” in the December, 1890 issue of the Harvard Law Review. Warren and Brandeis defined privacy simply as “the right to be let alone.” [1]

Since that landmark statement, others have expanded and elaborated upon it. Political scientist Alan Westin complements Warren and Brandeis when he states that privacy is “the voluntary and temporary withdrawal of a person from the general society through physical or psychological means.” Westin goes on to add another sense to his definition, one that especially resonates for us today as we examine issues related to Web access. He notes that privacy is “the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.” [2]

Legal scholar William Prosser has gone further, identifying four ways in which the invasion of privacy can occur: intrusion upon the individual’s seclusion or solitude, or into his or her private affairs; public disclosure of embarrassing or private facts about the individual; publicity that places the individual in a false light in the public eye; and, appropriation, for another person’s advantage, of the individual’s name or likeness [3].

Any of these invasions of privacy, but especially the first two — intrusion upon the individual’s seclusion or solitude, or into his or her private affairs, and public disclosure of embarrassing or private facts — can occur when a manuscript repository or research library acquires and makes available an individual’s collection of personal papers. It is in these senses of privacy that we will examine the issues surrounding its effect upon archives on the Web.

Few would doubt that privacy and the perceived threats to its safekeeping have become huge, persistent sources of concern in our society. Even anecdotal evidence indicates that this issue has grown to ever greater proportion over the past two to three decades, and that it continues to increase. Stories about various aspects of privacy appear in newspapers nearly every day now, and national surveys regularly assess the population’s degree of awareness and anxiety about privacy.

This year started in full privacy–alert mode, with a 1 January editorial by John Schwartz in the New York Times, called “What Are You Lookin’ At?” Listing several recent breaches in the security of electronic financial and other personal consumer data by corporations that keep such files (e.g., ChoicePoint, MasterCard, Bank of America), Schwartz notes that all such breaches are invasions of privacy that have exposed millions of Americans to potential abuse by identity thieves and other miscreants.

He also describes the collective shrug of the shoulders with which people received news of this privacy breach, as many people feel helpless to do anything about it. However, Schwartz goes on to note the public’s mixed reaction to the news of the National Security Agency conducting surveillance without warrants. In a New York Times poll conducted on commercial and government privacy issues after the revelation of the NSA surveillance, an impressive 88 percent of those polled expressed concern, while 54 percent said they were “very concerned.” [4] I think we may assume from this that the American public has by no means given up on the notion that their privacy does matter and that it should not be violated.

Technological advances gallop ahead, and each step opens new vistas of information, technological capability, and both threatened and actual invasions of privacy. In this digital age, organizations and governments have the ability to collect and retain vast quantities of data, sometimes for no other reason than the fact that they have the capability to do so. The most prosaic of everyday activities and transactions can be monitored — bank deposits, supermarket purchases (via club cards), video and DVD rentals, and telephone calls.

In this digital age, organizations and governments have the ability to collect and retain vast quantities of data, sometimes for no other reason than the fact that they have the capability to do so.

This last activity, of course, is especially vulnerable, given the Bush administration’s determination to monitor American citizens’ telephones when and where it chooses, without legal warrants to do so. Not only are organizations and government bodies watching all of us to an ever greater degree, they are also able to swap data as file–sharing across fallen barriers grows, fueled by technology.

The advent of the Internet has significantly ratcheted up the potential for invasion of privacy, even as it has become an enormously useful tool. Its increasing ubiquity, along with the ease of posting, altering and disseminating data on line, make it a powerful force in the spread of information, whether public or private. We are all familiar with the kinds of individual data that can be found on the Web about any or all of us.

Public records on line have become much easier to obtain than in the old days, when obtaining such information required a trip to City Hall or the County Hall of Records or a formal query to the appropriate level of government. Even the sacred Social Security Number, which officials realized or were persuaded rather late in the automation game must be safeguarded as private, can be obtained on the Internet by paying a small access fee.

We are all also familiar with the fact that, no matter how anonymous we might feel, we have only to Google our own names, and we can receive graphic, possibly surprising, evidence of how little anonymity exists any more and how much of a presence each of us has on the Web, whether we want it or not.

In this digital environment, what are the privacy issues surrounding the posting of both archival materials and the descriptions of those materials on the Internet? To a great extent, the same issues apply that have applied for many years to archival records and collections of personal papers.

The archival profession has sought to provide guidance for archivists faced with potentially private or confidential materials. The Society of American Archivists, our national professional association, developed and maintains a Code of Ethics to guide in the practice of our profession. The section of prime interest for us in this paper concerns privacy. In the 2005 revision, it reads: “Archivists protect the privacy rights of donors and individuals or groups who are the subject of records. They respect all users’ right to privacy by maintaining the confidentiality of their research and protecting any personal information collected about them in accordance with the institution’s security procedures.” Good advice, if somewhat nebulous.

Against it, however, the section on access says: “Archivists strive to promote open and equitable access to their services and the records in their care without discrimination or preferential treatment, and in accordance with legal requirements, cultural sensitivities, and institutional policies. Archivists recognize their responsibility to promote the use of records as a fundamental purpose of the keeping of archives. Archivists may place restrictions on access for the protection of privacy or confidentiality of information in the records.” (Society of American Archivists, 2005)

Clearly, these competing statements leave plenty of room for archivists to exercise their own judgment, and, over the past 25 years, my colleagues and I have tussled with the ethics of access and privacy in countless e–mail messages, informal discussions, conference sessions, publications, and seminars.

The archival privacy issues continue in the online age and, in fact, they can be even more challenging than ever, due to the factors we have already discussed — the ubiquity of the Internet, and the ease with which data can be collected, posted, altered, and widely disseminated. Let us look at some of the types and formats of archival documents and papers that pose privacy difficulties and then see how this material can prove even more troublesome in the Internet environment.

For this discussion, I will not deal with categories of material that are legally protected, e.g., medical records, personnel files, attorney–client files, and the like. This material must be handled according to legal statute and its care does not generally lead to the kinds of ethical, gray–area privacy questions that can vex and perplex archivists. Instead, archivists must attempt to apply the ethical code to such materials as diaries, personal correspondence, grant applications, photographs, business or professional files containing proprietary information, and other archival items. For these, there is little concrete guidance upon which archivists may lean.

Archivists must examine the content of this material, must be aware of the donor’s or family’s knowledge of the material and their sensitivities about it, must keep in mind the currency of the material (i.e., is it contemporary?; how many people are still alive who might suffer invasion of privacy from the revelation of information in the material?), and must probe the motivations for considering restricting the material.

For the categories of material I mentioned, I think you can easily envision why their availability could constitute an invasion of privacy. Making the situation even tougher for archivists, institutions increasingly collect the papers of living or recently deceased individuals. This means that a greater proportion of the material in the collections is created by people still living, whose privacy must be respected or at least taken into consideration in making decisions about opening or sealing various groups of material. Add the Internet into the equation, with its capacity for publication and wide dissemination of information gleaned from an archival collection, and the privacy climate becomes even more volatile.

Add the Internet into the equation, with its capacity for publication and wide dissemination of information gleaned from an archival collection, and the privacy climate becomes even more volatile.

Let’s look at a couple of examples of potentially sensitive material. Diaries are an obvious red–flag category, since the diary is the ultimate in private writings, ostensibly intended for no eyes but those of the writer. Diaries contain frank statements and revelations about one’s self and about other people. Should diaries held in an archival repository be opened for research, even when people mentioned in them are still alive? Or should they be sealed for a reasonable period? If so, what constitutes a “reasonable period?”

Similarly, correspondence, intended to be read by the recipient only, may contain confidential information about such things as closeted sexual orientation, or perhaps an abortion obtained without a husband’s knowledge. What, if anything, is the archivist’s responsibility in the face of such confidences conveyed in recently written letters?

Then there are constituent files in the records of public officials. Such files may include correspondence in which constituents seek public assistance or the relief of a sensitive situation. This correspondence might contain not only the person’s name and address, but also information like Social Security Number and other personal or financial data.

These issues present enough difficulties in an archival setting, but they become enormous when Internet posting is a possibility. It is one thing to make available possibly sensitive letters for research in a library reading room, but quite another to post them on the Internet, where they can be read potentially by millions of people.

It is one thing to make available possibly sensitive letters for research in a library reading room, but quite another to post them on the Internet, where they can be read potentially by millions of people.

At the very least, archivists must know the content of letters and documents they plan to post on the Internet. They must make decisions about what items can or should be posted, and whether posted items should be redacted to obscure or delete sensitive or private information.

Even if archivists intend only to place catalog records or finding aids on the Internet, some awareness and thinking about privacy are in order. An action as seemingly innocent as listing the names of the correspondents from a collection is not without pitfalls, and, again, this applies to individuals still living.

The context in which information is found can be critical. As Paul Sieghart (1984) has pointed out, “My name in the London telephone directory or the electoral roll is perfectly harmless, but my name in a list of potential subversives or bad credit risks is capable of doing me harm. ... It is what data you string together and what you do with them ... which may or may not do harm.”

In November 2005, an archivist posed a query to colleagues, seeking advice about whether to list names in an online finding aid. They were names of delinquent girls, and the archivist had already decided to seal the actual files for 100 years after the final date of each file. But, she sought opinions about whether she should even list the names in the closed files, or should consider that by merely listing the girls’ names, she would be violating their privacy.

There are other ethical questions involved in this example, but the finding aid question is one that is likely to arise for many archivists, even when the situation is somewhat more subtle or ambiguous than the case of the privacy of delinquent girls. For example, a finding aid for an activist organization might list the names of participants in protests or subversive activities. Especially in the current climate, with government agencies seeking search records from Google, Yahoo and other search engines, the people so named might understandably not wish to be fingered on the Internet for their political activism.

When decisions are made about what to post on the Internet, archivists and their institutions must grapple with the ethical considerations involved, with the guidance of the profession’s Code of Ethics, and with the individual and institutional comfort level of the decisions. Most archival repositories have policies in place regarding private materials, and these institutions have varying degrees of comfort about opening collections or sealing material that is sensitive.

These institutional policies scatter along a long continuum of proper, ethical behavior. At one end, a library in the U.K. had a policy of automatically sealing all letters by people still living. This policy, while probably stricter than American institutions would follow, nonetheless had the virtue of being completely impartial and easy to administer.

At the other end of the continuum are institutions that restrict nothing, for a variety of reasons. Some cite the practical reality that there is insufficient time to read everything in search of private information, while others feel that the imposition of restrictions cannot be done fairly and objectively. Still others hold that they cannot be held liable if private data is revealed, since they are unqualified to make the necessary judgments to restrict and therefore they restrict nothing, leaving the researchers to decide whether to publish potentially sensitive items.

Most institutions, however, occupy space somewhere between the two extremes: on the one hand, either sealing everything current, or, on the other, sealing nothing.

My cautionary comments are not intended to squash the posting of documents and finding aids on the Web. Rather, I hope to raise the awareness of privacy issues surrounding archival collections. In addition, I hope to encourage archivists and others to stop and think before embarking on a wide–scale digitization of archival collection materials and even of the catalog records and finding aids for those collections. End of article


About the author

Sara S. Hodson is the curator of literary manuscripts at The Huntington Library, San Marino, Calif., where she administers all British and American literary collections from the Renaissance to the present. A Fellow of the Society of American Archivists and past president of the Society of California Archivists, she lectures and writes frequently on literary and archival topics, especially archival ethics and privacy. Her most recent publication on privacy is “In Secret Kept, in Silence Sealed: Privacy in the Papers of Authors and Celebrities,” In: Menzi Behrndt–Klodt and Peter J. Wosh (editors). Privacy and Confidentiality Perspectives: Archivists & Archival Records (Chicago: Socviety of American Archivists, 2005).
E–mail: shodson [at] huntington [dot] org



1. Strum, 1998, p. 4..

2. Strum, 1998, p. 7.

3. MacNeil, 1992, p. 15.

4. Schwartz, 2006, pp. D1, D4.



Heather MacNeil, 1992. Without Consent: The Ethics of Disclosing Personal Information in Public Archives. Chicago: Society of American Archivists; Metuchen, N.J.: Scarecrow Press.

John Schwartz, 2006. “What Are You Lookin’ At?” New York Times, Section 4, column 1, p. 1.

Paul Sieghart, 1984. “Information privacy and the data protection bill,” In: Colin Bourn and John Benyon (editors), Data protection: Perspectives on information privacy: Contributions made to a conference on 11 May 1983 at the University of Leicester. Leicester: University of Leicester, Continuing Education Unit.

Society of American Archivists, 2005. “Code of Ethics for Archivists,” at, accessed 28 July 2006.

Philippa Strum, 1998. Privacy: The Debate in the United States Since 1945. Fort Worth: Harcourt Brace College Publishers.

Editorial history

Paper received 31 May 2006; accepted 15 July 2006.

Contents Index

Copyright ©2006, First Monday.

Copyright ©2006, Sara S. Hodson.

Archives on the Web: Unlocking Collections While Safeguarding Privacy by Sara S. Hodson
First Monday, volume 11, number 8 (August 2006),

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.