[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

editorial comments on draft-ietf-ltans-reqs-02.txt



These are mainly editorial comments; the only non-editorial
comments are marked with ****. I also have a number of
questions which might turn into substantive comments
depending on what was meant. I'm sorry for the length,
but I thought I should do a careful review:

Title:
  Long-Term Archive Service Requirements

It's traditional to capitalize words (except prepositions) in titles.
I think "Long-Term" is better than "Long Term" throughout the document
(used inconsistently).

===============
Abstract
  In many scenarios, users need to be able to prove the existence of
   data at a given time and integrity of data, especially digitally
   signed data, in a common and reproducible way after an arbitrarily
   long period.  This document specifies the technical requirements for
   a long-term archive service to support such scenarios.

This is a little hard to parse and perhaps misleading; the document
should describe long-term archive services and provide the technical
requirements for Internet protocols that interact with such services.
I would suggest:

#   There are many scenarios in which users, even after an arbitrarily
#   long period of time, will need to prove the original existance of
#   data and the integrity of that data in the duration. In addition,
#   there are requirements to prove the original validity of the
#   signatures of digitally signed data, even after an arbitrarily
#   long period. This document describes a class of long-term archive
#   services which support such scenarios, and the technical requirements
#   for protocols for interacting with such services.

===============
1. Introduction

   Digital data durability is undermined by continual progress and
   change on a number of fronts.  The useful lifetime of data may exceed
   the life span of formats and mechanisms used to store the data.  The
   lifetime of digitally signed data may exceed the validity periods of
   public-key certificates used to verify signatures or the
   cryptanalysis period of the cryptographic algorithms used to generate
   the signatures.  

I think you mean 'required lifetime' instead of 'useful lifetime'
(in sentence 2) and 'lifetime' in sentence 3.
===============
  "...external service accessible via an internet"

perhaps "via the Internet".
===============
  "... credible assertion of something that is currently asserted at points
   well into the future."

I suggest
#   ... credible assertion, at points well into the future, of something
#   that is currently asserted.

or just put a comma between "asserted at" above.
===============
Section 2:

I think this section could use a preface, e.g., 

# We define the following terms based on their usage in the archiving
# community, in order to provide a vocabulary for describing requirements
# and the standards around them.
===============
"Arbitrator" is defined but not used.
===============
    An evidence record may include
      acknowledgements from a LTA

I suggest writing out 'long-term archive service' here, since it
is  a forward reference.
===============
You might want to define 'attestation'.
===============
"Originator" is defined but not used. If you keep it, 'object.The'
 missing a space.
===============
   Timestamp: A signed attestation generated by a Time Stamping
   Authority (TSA) that a data item existed at a certain time.
   [RFC3161] specifies a structure for timestamps and a protocol for
   communicating with a Timestamp Authority (TSA).

I think it would be better to use the definitions from RFC 3161
rather than redefine the terms. RFC 3161 uses "time-stamp"
and not "timestamp", but defines 'time-stamp token'. Is it
appropriate to insist that long-term archive services use
RFC 3161 (i.e., are we defining a 'time-stamp' as a RFC 3161
signed time-stamp token) or are we just using RFC 3161 as an
example? I would suggest
 
#   Time-stamp: An attestation generated by a Time Stamping
#   Authority (TSA) that a data item existed at a certain time.
#   For example, [RFC 3161] specifies a structure for signed
#   time-stamp tokens as part of a protocol for communicating
#   with a Time-Stamp Authority (TSA).

#  Time-Stamp Authority (TSA): A trusted service that provides
#  attestations of existance of data at particular points in
#  time. For example, [RFC 3161] defines protocol elements for
#  interacting with a TSA.
===============
In general, we should double check spelling and terminology
and try to be a little more consistent with RFC 3161, at least
if the intent is to talk about the same thing.
===============
3. General principles

   A long-term archive service may accept any type of data for
   preservation, including digitally signed data, encrypted data, time
   stamped data, data that has not been the subject of any cryptographic
   processing, textual data or images.

I think the last point "textual data or images" should be expanded,
e.g.,

#   A long-term archive service may accept any type of data for
#   preservation. The data might be in any format, whether textual
#   data, images, documents, applications, or compound packages
#   of multiple components.  The data may be digitally
#   signed, time-stamped, encrypted, or not subject to any cryptographic
#   processing.

*** Is this a requirement that all of these be accepted, or is this
something where variability is allowed? What is the general principle here?

===============
    A long-term archive service does not operate upon evidence related to
   the content of archived data objects. Content-focused operations,
   including data format migration or translation, may be performed by a
   notary or notarization service.

*** This sounds like you intent it to be disallowed for a long-term
archive service to provide any content-focused operations.

Shouldn't this be 'is not required to operate upon' rather than
'does not operate on'?

In any case, even if it isn't the long-term archive service
doing the operating, I think it should be left open whether
the content-focused operations are performed by notary services
or, at least in part, by some other third-party service.
For example, there could be an (untrusted) content-transformation
service whose output would be validated by a (trusted) notary
service. So I suggest something like:

# A long-term archive service is not required to operate upon evidence 
# related to the content of archived data objects. Content-focused
operations,
# including data format migration or translation, may be performed by a
# other services, e.g., notary services.

=================
   A long-term archive service preserves archived data objects over
   arbitrarily long periods of time.

This reads like it is a definition. You could have a service which
only guarantees to preserve data for 30 years, not 'arbitrary long
periods of time'.  However, removing the word 'arbitrarily' leaves
this general principle not actually saying much. Perhaps you mean
something like:

# Different long-term archive services may establish policies and procedures
# for archiving data objects over different lengths of time.

is meant? I'm not sure.
=================
4. Technical Requirements

   This section describes requirements for a long-term archive system.

I think something like:

# This section describes the requirements for the protocol for
# accessing a long-term archive system.
=================
      - submit data and receive an acknowledgement or proof of deposit

perhaps 'acknowledgement or evidence of deposit'? I'm uncomfortable because
we haven't defined 'proof'. Is 'attestation' better than 'acknowledgement'
here (assuming we define attestation)? Or is 'proof of deposit' just
the definition of 'acknowledgement'? Perhaps something like

#     - submit data objects for archive

and then, later, with the other requirements on 'the format for
the acknowledgement'
#   When a data object is submitted for archive, an acknowledgement
#  is returned. The acknowledgement includes an attestation by the
#  archive service of the deposit of the data object.

=================
      - delete archived data objects/terminate archivation period for an
      archived data object.

Later, you point out that 'deletion may not involve physical deletion';
so this is confusing.  I think the first thing to do is to change
"permit clients to perform the following" to "permit clients to request
the following"; the client may request deletion, just as the
client may submit data, but find that the LTA can't deposit it.

**** May someone request the archivation period being
shortened but not immediately? (E.g., employee records are usually
kept 'for the length of employement plus N years', so at employment
termination, you might request shortening the lifetime to N years.)
Right now the only operations are 'delete' and 'extend'.

==================
   Submitters must be able to specify an archivation period. 

I would include this in the bulleted list above with a -.
========================
   It should
   be possible to extend the archiving period after the initial
   submission.

Who should be able to do this?
========================
   Submitters should be able to specify metadata that, for
   example, can be used to enable retrievers to render the data
   correctly.
 
I think this may be worthy of some amplification and clarification.
Do you mean explicitly content-type or other MIME-like headers?

**** Usually the word 'metadata' is also associated with indexing
information (Title, Author, Date, etc.). Is that kind of metadata
for long-term archived data included? Is it possible to search
on metadata in the long-term archive? Or is it explicitly
excluded, required to be part of the archived package or
part of the archived data?

=====================
   Following submission, the service must provide a value that can be
   used to retrieve the archived data and/or associated evidence.  It
   may be possible to retrieve archive packages by using a hash value of
   an archived data object.

**** Is this value a capability? Or does it also require authorization?
   Is the requirement that this retrieval handle be generated by the
   submitter, the long-term archive service, or by both (when
   it uses a hash value of the archived data object)?

   While I think using a hash value of an archived data object is
   an interesting approach, I'm not sure why it's in the requirements.
=======================
   Deletion requests must be authorized and an authorization policy must
   be defined and observed by the long-term archive service (as part of
   an archive policy). 

I assume the authorization policy is not just for 'deletion'?
I think this is just editorially talking about authorization
policies. I'm not sure whether the policy is per LTA or
per archived document, though.
=======================
   The format for the acknowledgements must allow the identification of
   the archiving provider.

**** Why is this optional? Is it the identification of the operator
of the LTA? Or of the LTA itself? 
=======================
   The format for the acknowledgements must allow the identification of
   the participating client.

**** I think this is a more substantial requirement, that the
submitting client must identify itself, and that identity may
(or must? or should?) be included in the acknowledgement?
========================
4.1.2 Rationale

Given that this is just once sentence (and most of the Rationale
subsections are short), I would suggest readability might be
improved by eliminating the section head.

#  4.1  Requirements for LTA operations
#
#   A long-term archive service must ...
#
#   ... end of current 4.1.1....
#
#   Rationale:
#   Submission, retrieval and deletion of archived data objects
#   are necessary basic functions of a long-term archive service.

#  4.2 Requirements to provide evidence records
#
#  A long-term archive service.... 
======================
4.2 Provide evidence records

I think this section could be expanded. For example,
the term 'trust anchors' isn't defined in the document
or referenced. And it isn't clear why 'trust anchors'
have to be provided by other services, rather than
just 'might be provided by other services'. 
I think this section puts some of the requirements
in the 'Rationale'. (I'm running out of steam or
I would try for a rewrite).

I don't think the requirements outline sufficiently
what is necessary to transfer data from one archive to
another, and what it means to do that; it's not just
the archived data, it's the evidence, and the requirements
for creating a chain of evidence could be spelled out.
Perhaps the transferability requirement belongs in 4.6
and not here.

The 'Submitters must be able to specify metadata ...'
belongs in 4.1 and not in 4.2, because it is a requirement
that submitters should be allowed to specify metadata
(or must be required to specify metadata?)
===============================
4.3 Again, some of the requirements seem to be in
the 'Rationale', and I'm not sure why Content-focused
operations are disallowed rather than just 'not required'.

I'm not sure what it means to demonstrate data integrity
without also providing evidence records. If you can't
demonstrate that you had the data at a particular date,
how can you demonstrate that the data you have is
'the same'?
=================================
4.4 Long-term archive policy

I think a reference for 'certificate policy' is called
for here, and needs to be spelled out. This probably
goes back to the 

     Long-term archive policy: A named set of rules that define
      operational characteristics of a long-term archive service.

in the definitions: what is the namespace of rule set
names? Do different LTAs share named archive policies?

It may be that this section has a part that is defining
what a Long-term archive service does (it operates
according to a policy), and that the requirements
are not operational requirements but interface requirements,
e.g., providing information identifying the policies.

I'm not sure what 'in use at any point in time'. Are
services allowed to change their policies? Can you
ask today 'what were your policies 10 years ago'?

   The policy may define characteristics
   such as the quality of timestamps obtained or generated by the
   long-term archive service or triggers for preservation activities,
   e.g.  timestamp refresh or data format migration.

So we need to invent protocol elements for describing
such things? What is the measure of 'quality of time-stamp'?
=================================
4.5 Data confidentiality

Again, this is a combination of description of what
a long-term archive service is, coupled with some
requirements for the interface to it.

#   Standard encryption algorithms and formats, e.g.  AES and CMS, should
#   be supported.

I'm not sure what this means -- is this an operational
requirement for how the LTA operates internally, or
a protocol requirement for the submitter/retriever <-> LTA
protocol. Elsewhere it said that transport security
might be used to provide confidentiality, which might
not use AES or CMS.
==================================
4.6 Data and evidence transferability

See notes for 4.2 above. I think that it might be possible
to be clearer about what the actual functional requirements
are to 'support the transfer'. In fact, 'transfer' isn't
really an accurate description of the operation, because
when the archived data object may actually be 'transferred',
most attestations and elements of trust will require
another layer.'
=====================================
4.7 Supporting Groups

I would put into this category "a document and its
translation into another format".   Consider all of
the subcategories of multipart: multipart/alternative,
multipart/mixed, multipart/signed... to cover the
important cases.


=======================
8. References

Current practice is to separate informative vs. normative references.
I suppose all of these are informative.  I believe that it would be
very useful to provide additional informative references to records
management practices.