Skip to end of metadata
Go to start of metadata

This is an interesting thread that I've been following on DigiPres:

The Justice Department recently seized the servers of a cloud storage service called Megaupload for alleged copyright infringement.  Granted this is an extreme situation with a lesser known company, but perhaps still a cautionary tale:  http://www.npr.org/blogs/therecord/2012/04/13/150535995/hearing-in-megaupload-case-to-determine-fate-of-users-data

Excerpt:

"...But the digital world is different, says copyright attorney Jim Burger. He says in the Megaupload case the court would probably have to appoint someone to sort through the huge amount of material involved.

He says that person would have to say, "'Oh yes, this 100 megabytes is Mr. Smith's and it's legal, it's his personal stuff. But this 50 megabytes is clear infringing.'""

Kimberly Peach

Archivist

WXPN Public Radio

On Fri, May 4, 2012 at 6:44 PM, Minor, David <minor@sdsc.edu> wrote:

Hi - I was about to respond to this thread and then noticed Chronopolis mentioned at the end of the previous message. So … please take what I say with a grain of salt… I promise I was going to say this anyway.  ;-)

I think there's a less of a difference in the current day than in previous times between "cloud" services and local storage. In many organizations, storage services, both basic and advanced, have moved to a more collaborative environment. As data size and complexity have grown, larger and more sophisticated data centers are needed. Often these centers are outside the scope of the traditional library computer room, if such even exists. This environment usually means the library working with either central campus computing or large local data centers, etc.

There is then the next level up from this, where organizations are working together to provide storage and storage-based services beyond what can be offered locally. Think here of HathiTrust or the California Digital Library. Similarly, as David Lowe mentions, things like LOCKSS or Chronopolis are also becoming quite prevalent.

In all of these cases, the *same* questions must be asked of the storage environment: is the data safe? Is at accessible? Who's driving "driving the bus" and making sure that everything is OK? And in point of fact, the technology is the *least* important question here.  Bruce Gordon is spot-on when he says, "Media is not the key to preservation." A well-formed plan with an explicit set of demands and rigorous monitoring of the systems are the key. This is the case whether the data is stored in the library, across campus or across the country.

The problem with many cloud systems is that they fall down in their transparency. I'll be frank here: I trust Amazon's ability to store bits more than I trust many local storage instances. They have iron and expertise that far outstrips many of us. I don't trust them to give me the kind of audits and logging functions I want though. Nor do I trust that they have my best interests at heart. And that's where the preservation step comes in: I don't doubt they *can* do it, but I'm not sure they *will.* And again, being perfectly frank, those are the same questions I often ask of my local, but external-to-me, storage.

$.02.

David Minor
Chronopolis Program Manager
Director of Digital Preservation Initiatives
UC San Diego Libraries
San Diego Supercomputer Center

On May 4, 2012, at 3:20 PM, David Lowe wrote:

> I can distill my mistrust of the cloud, although it is context-specific and I would agree that cloud storage might make sense for many situations.  (Hmm, I feel a flowchart/decision tree coming on, but that will have to wait.)
>
> My story goes back to a not-too-distant audit that our IT shop endured, and I was able to observe parts of the process.  Since Social Security numbers are among the most sensitive type of data that libraries have had over the years, and since our hosted ILS system did contain some SSNs in old layers of patron data, to satisfy the auditor, we had to go to the vendor to get a statement that there had been no unauthorized access to that info.  I think the auditor even wanted logs (!), which seemed silly to me, since it was really like proving a negative and could easily have been doctored anyway.  So, I see the vendor in this situation as a type of cloud, if you will, with third-party service level agreement terms standing in control of our crown jewels.  Replace SSNs in this scenario with restricted data from our archives, and I think the problem begins to reveal itself.  If libraries and archives are to retain their trusted status as knowledge repositories in society’s eyes, then as institutions, we need to be as sure as we can that what we preserve is protected inside and out from neglect, malicious behavior, entropy, etc.  Handing off core mission functionality to the custody of contractual service terms involves a level of risk I would prefer to avoid.  Years ago in Margaret Hedstrom’s DP class in grad school, we had a lot of David Bearman on our reading list, and he emphasizes the need for archival material to “hold up in court,” which is why provenance matters so much.  Chain of custody, like any other chain, is only as strong as its weakest link, so avoiding third parties as much as possible seems most prudent to me in the context of control over digital data in libraries and archives.  I admit this is not always possible, but risks are things we balance and manage.
>
> To the original question for this thread, I wanted to say that, as an NPR listener and fan for almost all of my adult life, I would consider the cultural significance of this material to be worthy of the security of a trusted digital repository, perhaps with a LOCKSS- or Fedora-based infrastructure, secured (“dark” or “offline”) where needed, and openly accessible and linkable where possible.  Services like Chronopolis are becoming a viable way to pair the accessibility of cloud services with the trustworthiness of the cultural institutions that have built these collaborations.
> --DBL
> David B. Lowe
> Preservation Librarian
> UConn Libraries
>
> From: Jacob Nadal [mailto:jnadal@brooklynhistory.org]
> Sent: Friday, May 04, 2012 9:32 AM
> To: digipres@ala.org
> Subject: [Digipres] Cloud concerns (Was: Physical Medium for Preservation Copies)
>
> Bruce raises an issue regarding cloud storage that has nagged at me – we seem to have some ambient mistrust of the cloud in the digital pres community, yet wide acceptance that “an array of some kind that constitutes a logical volume” will be the standard means of storage. Those things seem at odds to me. A cloud system is an array that constitutes a logical volume, is it not? What’s not to like?
>
> -----
> Jacob Nadal
> Director of Library and Archives
> Brooklyn Historical Society
>
>
> From: Gordon, Bruce [mailto:bgordon@fas.harvard.edu]
> Sent: Thursday, May 03, 2012 4:55 PM
> To: Ira Apt
> Cc: Howard Besser; Stern, Randall; Janel Kinlaw; digipres@ala.org
> Subject: [Digipres] Re: Re: RE: Re: Physical Medium for Preservation Copies
>
> Optical media are not recommended for preservation purposes. Beyond the short lifespan (which may have been lengthened with recent developments) is the limited storage space and constant monitoring to ensure there are no errors. By the time you are setup to do a proper job with optical media it is more expensive and cumbersome than it is to use multiple copies on hard disk and tape, or even in the cloud if you are a trusting soul. Optical media shine when you need a portable copy for presentation and when there is also no streaming version available. Media is not the key to preservation. A comprehensive system that protects your assets and a plan that includes monitoring technological trends for eventual migration of the essence from one file format to another when obsolescence threatens is the key. Please see IASA-TC 04 section 8.1.1 regarding optical media. IASA-TC 04 is available in a web version for free.
>
> http://www.iasa-web.org/tc04/audio-preservation
>
> You may have noticed that file sizes are not decreasing. This means that increasingly your files will not necessarily be on a single, particular piece of media but will most likely be spread across an array of some kind that constitutes a logical volume.
>
> Best,
>
> -Bruce
>
> Bruce J. Gordon
> Audio Engineer
> Eda Kuhn Loeb Music Library
> Harvard University
> Cambridge, Massachusetts 02138
> U.S.A
> tel. +1(617) 495-1241
> fax +1(617) 496-4636
>
>
>
>
>
>
>
>
> On May 3, 2012, at 2:23 PM, Ira Apt wrote:
>
>
> I am a fan of Archival optical media as the removable option.  (cd/dvd/bd)
>
> Yes, I do work for the manufacturer but it will be a good option for some time.  The concerns given on file formats do apply.
>
> With the release of BDXL 100gb, we are at least getting a little closer in capacity.
>
> Ira
>
> Ira Apt
> Sales Manager
> MAM-A Inc.
> 10045 Federal Drive
> Colorado Springs, CO  80908
>
> (918) 352-3681 Direct
> (918) 688-3818 Mobile
> (305) 946-8314 Fax
>
> Ira.Apt@mam-a.com
>
> WWW.mam-a.com
>
>
> On May 3, 2012, at 12:56 PM, "Howard Besser" <howard@nyu.edu> wrote:
>
> Storage on spinning disk does not necessarily mean storage on a server
> connected to the Internet (or even to another server).  And given 2
> parallel situations: storage on a spinning disk in a secure area vs
> storing on removable media in a secure area -- I'd say that a malicious
> act would have a greater chance of success against removabe media (because
> there's no further protection once you're inside the secure area, and no
> trace is left of when a malicious act took place).  There might be a
> slightly larger chance of inadvertent distruction on a spinning disk, but
> at least check-summing software would warn you that that had happened
> (whereas with the removable media, it could be years before you discovered
> that a cleaning crew had buffered the floor with a machine whose motor had
> demagnetized the tapes stored on the lower shelves).
> -howard
>
>
> On Thu, 3 May 2012, Stern, Randall wrote:
> Re: benefits of removable media. There is still some benefit there, once
> you remove the media to a safe storage facility, in protection against
> malicious or inadvertent destruction of all on-line copies.
>
> Randy Stern
> Manager of Systems Development
> Library Technology Services,  Harvard Library
>
> ----Original Message----
> From: Howard Besser [mailto:howard@nyu.edu]
> Sent: Thursday, May 03, 2012 1:28 PM
> To: Janel Kinlaw
> Cc: digipres@ala.org
> Subject: [Digipres] Re: Physical Medium for Preservation Copies
>
> I think that your inquiry is really 2 questions that need separate
> answers: redundancy and removable media:
>
> Data centers have more than 40 years of well-developed history of handling
> redundant storage: multiple copies and physical dispersal of these are the
> key.  Formulas for optimal numbers of redundant copies have been developed
> by the LOCKKS project and Storage Resource Broker (SRB), but are
> ultimately governed by your budget vs the level of risk you can take.
>
> The most advanced thinking today is that the only reason for removable
> media is that your total storage needs would be too expensive if put on
> spinning disks, though that cost difference will vastly diminish over
> time.  For large-scale storage, current thinking focuses on LTO tapes.
>
> But hidden within the "removable media" advocacy are some other issues
> that are not necessarily solved by removable media at all.  One is
> bit-flipping (which does little damage to a digital audio file if it flips
> in the "content" part, but is hugely damaging if the bit flips in the
> header or other metadata).  Another is the tying of your content (audio
> files) to a specific management and retrieval system (removable media
> sounds like it solves this, but it doesn't), and some people advocate
> putting each set of replicated files into management/retrieval systems
> having different types of architectures.
>
> On top of that, of course there are issues of migration (when your
> Broadcast WAV files are no longer supported by any audio software),
> periodic check-sums, tracking of any changes made to your files (PREMIS),
> and other routine digital preservation activities.
>
> .......................................
> Howard Besser, Professor and
> Director, Moving Image Archive and Preservation Program
> NYU's Tisch School of the Arts
> Cinema Studies Department
> 665 Broadway, room 612
> New York, NY  10012
> tel: 212-992-9399
> fax: 212-995-4844
> howard@nyu.edu
> http://besser.tsoa.nyu.edu/howard/
> http://www.nyu.edu/tisch/preservation/
>
>
> On Thu, 3 May 2012, Janel Kinlaw wrote:
> Hello!
>
> At NPR we are trying to move away
> from multiple physical copies of our archival audio.  Our goal is to
> just have redundant digital copies and one physical copy on some medium
> in case of catastrophic failure with the servers.
>
>
> We are curious to know what
> physical medium other audio archives are using for preservation/backup
> copies of digital audio files.
>
>
> Thanks!
> --Janel Kinlaw
> Broadcast Librarian, NPR
> jkinlaw@npr.org
>
>
> --
> .......................................
> Howard Besser, Professor and
> Director, Moving Image Archive and Preservation Program
> NYU's Tisch School of the Arts
> Cinema Studies Department
> 665 Broadway, room 612
> New York, NY  10012
> tel: 212-992-9399
> fax: 212-995-4844
> howard@nyu.edu
> http://besser.tsoa.nyu.edu/howard/
> http://www.nyu.edu/tisch/preservation/