Discussion:
Internet Archiving
Nathan Willis
2014-03-04 02:26:24 UTC
Permalink
Hello fellow librarians,

I have a semi-out-of-the-blue idea to pitch to the group about OFLB. I've
tossed it around with a couple of people in private conversation, and I
think it's worth looking into, so I want feedback from the interested
parties as a whole.

Back in December, I was thinking about how the OFLB site tends to grow in
fits of activity followed by long pauses, and I think part of the reason
for that might be that we rely on the site to do a wide range of different
services, so keeping them all running smoothly at once isn't as easy as it
could be for a simpler service.

But probably the most important service we undertake is to provide a
not-for-profit index and collection of all of the open fonts that we can
find (and verify, of course). Sometimes that's a tricky prospect because
there are sources that are really different: individual type designers who
don't publish their work anywhere else, one-off designs that come from
projects not interested in ongoing maintenance, works for hire, etc.
Organizing that can be a big task.

It kind of struck me that we're attempting to do something similar to what
Internet Archive does for public domain books, music, mailing lists,
historical video, and so on. Then I looked into how they organize things
and discovered that they're already completely open to archiving all sorts
of other collections -- in fact, they already do have a big free software
collection, and a lot of uploads include open fonts ... it's just that
they're in .zip formats without any special treatment of the metadata or
presentation. So I wondered if they would be somebody we could partner
with to take on the "archiving" aspect of the OFLB site, and maybe reduce
the overhead that the open font community volunteers do currently.

I checked around, sent an email, and they were super nice, essentially
saying "sure, we can host your files, just like anybody else." The
advantage, of course, is that they do this upload, filter, DB, and index
stuff all the time. I asked their collections people and essentially any
format that can be parsed, they can populate custom metadata fields on, and
make all of it searchable.

So here's the idea in a nutshell: if we start a "collection" section for
open fonts at Internet Archive, we can offload a good portion of the work
currently being done by the OFLB CMS, and maybe the simpler part that
remains will be easier to keep up to date. We'd still be responsible for
curating the content, just not keeping the servers going.

Obviously the question there is where do you draw the line; there are other
aspects to the OFLB site and not all of them would be handled by the IA's
server infrastructure (which is designed to optimize for its big
collections).

The way I see it, right now the OFLB site does several things:

1. Indexes existing open fonts, including important metadata
2. Hosts the actual font files
3. Allows designers to "release" uploads of new versions of their fonts
4. Serves CSS webfonts
5. Has a wiki
6. Showcases new uploads/releases
7. Has a web font preview page for each of the library fonts

IA could definitely do (1) and (2); in fact that's right up its alley. The
question would be, if we moved those features to IA, would we still want to
tackle the others?

I'm pretty positive IA cannot do (4) (the @font-face service); that's
certainly important to a number of people. I also don't think they can do
(7); we could have uploaders generate static preview images -- in fact,
we'd pretty much have to; you could script generating a preview image, but
there's no way to automatically know what the "interesting" characters or
features are to include. Honestly, I don't think we really need to keep
the wiki feature -- there are other places that could do that, and much of
it pertains to things like Fontforge more than to OFLB itself. But perhaps
by letting someone else maintain the file hosting and indexing features, we
could build a much lighter-weight site to tackle the remaining tasks, and
have an OFLB site that's easier to maintain.

What do you think? Even if we were to move file hosting to IA, we would by
no means be eliminating the need for OFLB: finding, adding, and cataloging
open fonts would still be very important; it's just that it'd be a
different server. Maybe a different login.

I'm sure I skipped over something really important; please ask. But as a
blanket statement, the IA folks I've talked to have been super helpful
people, they have well-documented read and write APIs, and a well-managed
Lucene-based search engine. So I think just about any library task we cook
up, they'd be able to handle on their end.

Feedback?

Thanks,
Nate
--
nathan.p.willis
nwillis-eiP9NBaGPlk1WUs8F/Ki+***@public.gmane.org <http://identi.ca/n8>
Loading...