On Fri, 5 Oct 2007, James M Snell wrote:
If you rely on a whole bunch of application specific extension elements, you will not realize any significant benefit from using Atom. If you go with Atom, find a more general way of encoding your data (e.g. use RDF for instance). Regarding the question about tool support, that depends entirely on who will be using the application. If you're building a tool for a very specific and limited audience, it's probably not worth the effort. If you're building a tool for an open audience, and the data from your application might be used for purposes you hadn't originally intended, use Atom and eliminate all the app-specific extensions.
That strikes me as sound advice. I saw a quote from Bill de hÓra cited on Aristotle Pagaltzis' blog (http://plasmasturm.org/log/463/) a while back:
"Any kind of data garden is fair game for AtomPub to rationalize."In higher education we (faculty,librarians, etc.) are drowning in digital collections ("one off's") that eventually need to be shared, ported to the web (e.g. currently on some departmental Filemaker server), preserved, repurposed, etc. Current library tools (e.g. DSpace) generally hue to the Dublin Core way of looking at the world, which simply does not fit with the way faculty want to think of their stuff. The way I see what we've done in DASe is a slightly more structured form of "tagging", but the tags here are allowed to have a type (i.e., not just all keywords). I wouldn't call it a 'specific and limited audience' by any means, but for these purposes, perhaps so. I am told that ArtStor (the largest vendor currently for web-based Art & Art History image collections) has now opted to go with key-value pairs (they too are "harvesting" collections that have originated in a wide variety of places) rather than "top-down" metadata schemas. It'll be interesting to see how this all shakes out.
I don't really need/want the complexity of RDF and I certainly do not want to try to explain such a thing to a not-particularly-technology-saavy faculty member that I am trying to persuade NOT to simply build another Filemaker database! The constraints provided by a flat key-value system has proven quite useful, actually. I suspect that I will end up establishing some subset of Atom Elements (title, summary, updated, id) that every collection has as "common" attributes and simply throw the rest of the collection-specific key-value pairs into the <content> element as xml (or perhaps xhtml).
-peter keane daseproject.org
- James pkeane wrote:Yup, I am trying to decide if the tool support is enough to justify the effort. And still I wonder if there is some other potential side benefit am not seeing. Here, by the way, is a collection as Atom Feed (with only one item shown). Note that collection owners can declare their own custom attributes to "map" to Atom Elements if they wish, in which case they appear in the default Atom namespace, otherwise they are in the "dase" namespace (standard administrative metadata common to all collections) or in the collection's own namespace. Note that there is no hand coding here, just a method on a collection object, e.g "print $collection->asAtom()". -pk <?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:dase="http://quickdraw.laits.utexas.edu/dase" xmlns:texpol="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/1.0" xml:base="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/"> <title>Texas Politics Image Collection</title> <id>http://quickdraw.laits.utexas.edu/dase/texpol_image_collection</id> <author> <name/> </author> <updated>1969-12-31T18:00:00-06:00</updated> <entry xml:base="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/"> <id>http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/000435205</id> <updated>1969-12-31T18:00:00-06:00</updated> <title>Congress Avenue</title> <summary>photo of Congress Avenue</summary> <dase:admin_mime_type>image/jpeg</dase:admin_mime_type> <dase:admin_filename>PICA17902.JPG</dase:admin_filename> <dase:admin_checksum>c837f0abd05c8b7126b8dac15d510f30</dase:admin_checksum> <dase:admin_file_size>787705</dase:admin_file_size> <dase:admin_image_width>1408</dase:admin_image_width> <dase:admin_upload_date_time>2007-07-18T15:59:19</dase:admin_upload_date_time> <dase:admin_serial_number>000435205</dase:admin_serial_number> <dase:admin_image_height>1209</dase:admin_image_height> <texpol:keyword>Congress Avenue</texpol:keyword> <texpol:keyword>buildings</texpol:keyword> <texpol:scratch_pad>/PICA17902.JPG</texpol:scratch_pad> <texpol:rights_owner>Austin History Center</texpol:rights_owner> <texpol:rights_status>Use in Texas Politics content</texpol:rights_status> <texpol:credit>Photographer: Unknown</texpol:credit> <texpol:dase_rights>Restricted</texpol:dase_rights> <texpol:original_filename>PICA17902.JPG</texpol:original_filename> <texpol:used_in_chapter>executive</texpol:used_in_chapter> <link length="9701" type="image/jpeg" rel="http://quickdraw.laits.utexas.edu/dase/media/thumbnail" href="/media/thumbnail/000435205_100.jpg"/> <link length="78505" type="image/jpeg" rel="http://quickdraw.laits.utexas.edu/dase/media/viewitem" href="/media/viewitem/000435205_400.jpg"/> <link length="783340" type="image/jpeg" rel="http://quickdraw.laits.utexas.edu/dase/media/full" href="/media/full/000435205_3600.jpg"/> <content src="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/media/thumbnail/000435205_100.jpg" type="image/jpeg"/> </entry> </feed> On Fri, 5 Oct 2007, James M Snell wrote:Basically, if it's a closed system with specific clients, there likely will not be any benefit to using Atom. If you wish to enable interchange and interop with other applications, there will be benefits to using Atom, if only to leverage the existing tool support. - James pkeane wrote:Yes, it is really nothing more than key-value pairs. I am more wondering about the possible benefits of Atom than whether this system works -- I use it for data import/export of the collections and it is quite easy to create parsers and generators for this format that lets me move it in and out of the relational databse that the application uses. The database itself is also quite generic: a "collections" table, and "items" table, a "values" table and an "attributes" table (each value has an item_id and and attribute_id). It is important that the data model be able to grow organically -- as a user adds a new "field" (aka key or attribute) to describe the items they have, they'll have no knowledge at all of Atom or Dublin Core or any of that. And it's fine -- every collection has a unique set of attributes (aka fields or keys). The composite primary key for attribute is "ascii_id" plus "collection_id". The system has been in production and heavily used for a couple years, and includes 88 collections comprising 300,000 items. The are currently 1358 rows in the attribute table (those are the keys in the key->value pairs) and 4.5 million rows in the value table. We've had no problems at all with this current architecture. And yet I wonder what Atom could do for me as a more standard XML format for data serialization... thanks! Peter Keane daseproject.org On Sat, 6 Oct 2007, A. Pagaltzis wrote:* pkeane <pkeane@xxxxxxxxxxxxxxx> [2007-10-05 07:00]:<item serial_number="000435213"> <metadata ascii_id="admin_checksum">630230b057c511cbee87447960fff02e</metadata> <metadata ascii_id="admin_filename">62-GT-06.jpg</metadata> <metadata ascii_id="admin_file_size">318642</metadata> <metadata ascii_id="admin_image_height">576</metadata> <metadata ascii_id="admin_image_width">720</metadata> <metadata ascii_id="admin_mime_type">image/jpeg</metadata> <metadata ascii_id="admin_serial_number">000435213</metadata> <metadata ascii_id="admin_upload_date_time">2007-07-18T15:59:28</metadata> <metadata ascii_id="credit">Photographer: Unknown</metadata> <metadata ascii_id="dase_rights">Restricted</metadata> <metadata ascii_id="description">photo of Ben Barnes while Speaker, black and white</metadata> <metadata ascii_id="keyword">Ben Barnes</metadata> <metadata ascii_id="keyword">Capitol Building interior</metadata> <metadata ascii_id="keyword">Lieutenant Governor</metadata> <metadata ascii_id="keyword">Speaker of the House</metadata> <metadata ascii_id="original_filename">62-GT-06.jpg</metadata> <metadata ascii_id="rights_owner">Senate Media Services</metadata> <metadata ascii_id="rights_status">Use in Texas Politics content</metadata> <metadata ascii_id="scratch_pad">/62-GT-06.jpg</metadata> <metadata ascii_id="title">Ben Barnes</metadata> <metadata ascii_id="used_in_chapter">none</metadata> <media_file filename="000435213_800.jpg" size="medium" height="576" width="720" mime_type="image/jpeg" /> <media_file filename="000435213_100.jpg" size="thumbnail" height="80" width="100" mime_type="image/jpeg" /> <media_file filename="000435213_640.jpg" size="small" height="480" width="600" mime_type="image/jpeg" /> </item> Any thoughts on the benefits of using atom here?I don?t see the problem. Atom gives you an Entry where you can put the metadata for a media resource. You have a bunch of attributes that should be mapped to Atom elements; the rest you stick into the content, possibly as RDF since your ad-hoc vocab is more or less along those lines anyway.I cannot get past the fact that my ultra-generic xml schema is REALLY easy to deal withIt?s not actually very generic. It?s a very limited vocabulary that expresses barely any more than a map of key-value pairs. Of course such a simple data structure is easy to deal with. The only genericity there is that the keys are arbitrary strings. It looks easy now because you have to do almost no work up front: the structure is rigid and the semantics are completely ad-hoc. It won?t look very easy at all once you have a large dataset with an inconsistent mess of key names. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>