[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

draft-ietf-ltans-reqs-02.txt: Miscellaneous non-editorial comments



This thread collects various non-editorial comments and provides some
responses.

> The document misses to indicate that it is necessary to demonstrate the
origin of the data

I don't think 'origin of data' is a service an LTA can provide.  The best an
LTA can do towards proof of origin is collection and preservation of
materials required for source authentication and this can only occur in the
'active' mode of operation.  Data handled in a 'passive' manner may contain
sufficient information to authenticate a source of the data with the LTA
having no knowledge of (or possibly even access to) that information.

An LTA may operate such that it will only accept data for which a source can
be authenticated.  This provides an extra measure against LTA administrators
altering data (by substituting timestamps for example).

> #   A long-term archive service may accept any type of data for
> #   preservation. The data might be in any format, whether textual
> #   data, images, documents, applications, or compound packages
> #   of multiple components.  The data may be digitally
> #   signed, time-stamped, encrypted, or not subject to any 
> cryptographic
> #   processing.
> 
> *** Is this a requirement that all of these be accepted, or 
> is this something where variability is allowed? What is the 
> general principle here?

There is no requirement expressed w.r.t. the type of archived data an LTA
must accept from a submitter.  This will be clarified by introducing the
listed possibilities as examples.  The types of permissible data may be a
policy component.

> *** This sounds like you intent it to be disallowed for a 
> long-term archive service to provide any content-focused operations.
> 
> Shouldn't this be 'is not required to operate upon' rather 
> than 'does not operate on'?
>
> In any case, even if it isn't the long-term archive service 
> doing the operating, I think it should be left open whether 
> the content-focused operations are performed by notary 
> services or, at least in part, by some other third-party service.
> For example, there could be an (untrusted) 
> content-transformation service whose output would be 
> validated by a (trusted) notary service. So I suggest something like:
> 
> # A long-term archive service is not required to operate upon 
> evidence # related to the content of archived data objects. 
> Content-focused operations, # including data format migration 
> or translation, may be performed by a # other services, e.g., 
> notary services.

The intent was to preclude content-focused operations under the LTA banner,
recognizing that related capabilities might operate on content (e.g. to
verify signatures contained in the content or to translate the content in
some way).  The suggested text conveys the intent more accurately.  If the
end user experiences this as an LTA service (and it seems reasonable to
expect that to be the case), perhaps this note should simply be deleted.    

Section 4
> **** May someone request the archivation period being 
> shortened but not immediately? (E.g., employee records are 
> usually kept 'for the length of employment plus N years', so 
> at employment termination, you might request shortening the 
> lifetime to N years.) Right now the only operations are 
> 'delete' and 'extend'.

Archivation period shortening should be identified as a possible operation
(and has been in the working copy of the draft).  This introduces a new type
of actor, e.g. modifier in addition to submitter/retriever.

>    Submitters should be able to specify metadata that, for
>    example, can be used to enable retrievers to render the data
>    correctly.
>  
> I think this may be worthy of some amplification and clarification.
> Do you mean explicitly content-type or other MIME-like headers?
> 
> **** Usually the word 'metadata' is also associated with 
> indexing information (Title, Author, Date, etc.). Is that 
> kind of metadata for long-term archived data included? Is it 
> possible to search on metadata in the long-term archive? Or 
> is it explicitly excluded, required to be part of the 
> archived package or part of the archived data?

Content type and MIME-like headers are certainly examples.  Others might be
specific to a particular system or application (e.g. classification code),
general support for searching (e.g. keywords) or other generic information
(e.g. contributors, title, author, date).  It seems likely that an
implementation would support searching on at least some of these.  A
question we ran into working on the (unpublished) protocol spec was whether
or not there needed to be support for "archived" and "unarchived" metadata,
i.e. like signed and unsigned CMS attributes.  We should probably have a
profile of such attributes for which support is required.  Should ability to
search on particular types of metadata be mandated?  If metadata is not an
appropriate term, attribute could be substituted.

> =====================
>    Following submission, the service must provide a value that can be
>    used to retrieve the archived data and/or associated evidence.  It
>    may be possible to retrieve archive packages by using a 
> hash value of
>    an archived data object.
> 
> **** Is this value a capability? Or does it also require 
> authorization?
>    Is the requirement that this retrieval handle be generated by the
>    submitter, the long-term archive service, or by both (when
>    it uses a hash value of the archived data object)?
> 
>    While I think using a hash value of an archived data object is
>    an interesting approach, I'm not sure why it's in the requirements.

I'd viewed this as a server-generated value (even in cases where the client
could predict the value beforehand, e.g. hash).  A client could specify an
identifier as a metadata/attribute value.  I did not view possession of the
value as authorization to perform any action related to the associated
archived data object or evidence.  

> =======================
>    The format for the acknowledgements must allow the 
> identification of
>    the archiving provider.
> 
> **** Why is this optional? Is it the identification of the 
> operator of the LTA? Or of the LTA itself? 

The intent was LTA itself.  The statement indicates that the formats must
support specification of the LTA.

> =======================
>    The format for the acknowledgements must allow the 
> identification of
>    the participating client.
> 
> **** I think this is a more substantial requirement, that the 
> submitting client must identify itself, and that identity may 
> (or must? or should?) be included in the acknowledgement?

So submissions must be authenticated?

Section 4.4
> It may be that this section has a part that is defining what 
> a Long-term archive service does (it operates according to a 
> policy), and that the requirements are not operational 
> requirements but interface requirements, e.g., providing 
> information identifying the policies.
> 
> I'm not sure what 'in use at any point in time'. Are services 
> allowed to change their policies? Can you ask today 'what 
> were your policies 10 years ago'?

It seems likely that services will have to change policies over time.  Good
question as to whether it should be possible to ask for policies in use at
time X.  It may be overkill to require an interface for this.  Instead the
formats must include a means of indicating the policy in use.  Thus, the
policies applied to a specific object at a particular point in time can be
determined but not the policies in use at a particular time.

>    The policy may define characteristics
>    such as the quality of timestamps obtained or generated by the
>    long-term archive service or triggers for preservation activities,
>    e.g.  timestamp refresh or data format migration.
> 
> So we need to invent protocol elements for describing such 
> things? What is the measure of 'quality of time-stamp'?

We need to determine what aspects of policy should be applied on a per
document basis and/or what aspects should be controlled explicitly by a
submitter.  Protocol elements would be necessary for the latter, at least.
The "quality of timestamp" phrase should be replaced by "characteristics of
timestamps" (possibly to include duration, key size, specific TSA).

> "A long-term archive service may operate in two modes:
> 
>     - a passive mode, where the archived data object is an opaque
>       collection of bytes, or/and
> 
>     - an active mode, where the archived data object is associated
>       with cryptographic maintenance parameters.

I like the idea of distinguishing between data that is operated upon by an
LTA and data that is not.  However, these are  modes of operation for an LTA
w.r.t to an archived data object (i.e. an LTA could operate in a passive
mode for some objects and an active mode for others).  Passive v. active as
applied to data evokes ongoing preservation activities, however.  Opaque vs.
transparent may be a suitable alternative.