One of the tasks that I'll be working on during my brief stint here in Nepal is researching and (hopefully) implementing a way to organize all the different media objects produced by OLE Nepal as the basis for their E-Paath learning activities. Currently, we are talking about several thousand images, sounds, texts, and videos, but it's not hard to imagine their repository containing hundreds of thousands or more artifacts in the not-too-distant future. Apart from the specific OLE Nepal use-case, I also believe that even larger content repositories have to be a core consideration for both the larger OLPC and SugarLabs efforts.

In order to efficiently handle this quantity of material, one needs a solid and scalable solution. Let's just call it Educational Content Management (ECM), shall we?

The basic requirements for such a solution are as follows:

  • The ability to handle tens if not hundreds of thousands of multimedia objects
  • Easy to search so existing objects can be quickly retrieved
  • A version control mechanism, especially for text documents which tend to undergo a lot of revisions
  • Reasonably easy to integrate into the current workflow
  • The ability to define workflows, with the simplest one being the review of an object
  • Support for metadata that goes beyond what normal file formats offer
  • Allow for batch processing (upload, download, tagging, etc.)
  • Preferably based on software people already know, e.g., a browser or file explorer
  • Open-source

After doing some research last week, I came up with half a dozen solutions that looked reasonably well suited to meet these requirements:

  • Alfresco
  • Daisy
  • Epiware
  • Knowledge Tree
  • LogicalDOC
  • Main Pyrus
  • xinco DMS

Upon further inspection, I decided to give Alfresco a shot since it appeared to be the most versatile solution. Well, two days later and I'm still stuck toying around with Alfresco and not very successful at getting it to do what I want. In particular, I've been concentrating on two use cases that I'd like to address by utilizing an ECM solution:

Use Case 1: Managing Text Documents

One of the most important assets during the design of an E-Paath activity are four text documents:

  1. Activity document: This is the blueprint for the activity and contains every piece of information the developers need to implement it. It's created by a curriculum expert and/or teacher.
  2. Teacher's note: An extensive document detailing learning goals, links to school book contents, ideas for preceding and follow-up activities, etc.
  3. Lesson plan: A detailed overview of how teachers can use the activity in the classroom.
  4. Help text: E-Paath activities contain an online help-text to facilitate use.

Currently, all of these texts are saved as .docx files and stored on a central fileserver where multiple versions of the same document are saved for archival purposes. However, people communicate informally about which version is the latest one, which steps need to be taken next, etc. This scenario could benefit from an ECM that explicitly implements workflows, assigns roles, and provides a one-stop solution for saving and retrieving the current and relevant versions of documents.

Since Alfresco offers a SharePoint Protocol component, the idea was to set this up in the backend and allow people to interface with the system via their current software of choice, Word 2007. However, after 10 hours of experimenting, I still haven't managed to get this running. The built-in Office functionality lets me create a document workspace, but when I restart Word and try to retrieve documents from that workspace, I end up with an error message saying the repository URL isn't valid.

Use Case 2: Managing Image Files

The second major use case is managing image files, which make up the majority of the assets for E-Paath activities. Even today, with a relatively small team, there are thousands of images stored on the fileserver. With changing teams, this problem will only get worse. A solution should have the following capabilities:

  • Quick search to find existing materials
  • Batch capabilities for upload, download, and tagging
  • Support for extensive but not mandatory metadata (so metadata can be added incrementally later)

Batch uploading worked well, but only if using Chrome or Internet Explorer. There's an issue with Firefox, Flash, AdBlock extension, and Windows Vista. Once the images are in the document library, it's painful to add metatags, as this can only be done on a per-picture basis. The search feature works well, but there’s no batch download solution to download multiple results at once.

Conclusion:

At the end of the day, Alfresco seems to have a lot of potential, but it currently doesn't quite meet my requirements for an Educational Content Management system. I'm still experimenting with it, and would appreciate any pointers or information on solutions for the issues described above. I’d also be very interested in your suggestions for and experiences with other Enterprise Content Management solutions that could meet the requirements discussed.