Requirements: RDF and social applications

Update 2009-08-06: added more information on named graphs, a reference to AliBaba, and a clarification on text handling. RDF data is managed in a decentralized manner which makes it ideal for social applications (where many people collaborate). In this post, I've collected requirements for RDF engines on which a social application is to be built. The main features are
  • Named graphs: are supported by almost all RDF engines. They partition the RDF repository. Social applications should authorize access at graph granularity. That way, some graphs can be private and others public. RDF allows one to hide the “seams” between graphs at will. An RDF repository should support this by enabling one to show and hide graphs on the fly, during access. SPARQL and Sesame can both do this. The former by constraining the graph URI, the latter by specifying a set of contexts when invoking RepositoryConnection.getStatements().
  • Distributed version control: provides two abilities. First, versioning is useful for personal use (history, undo) and collaborative use (conflict management, tracking who made what changes). Second, peer-to-peer synchronization is useful for offline use, backup, and collaboration. Pastwatch is an example of very clever (file-based) distributed version control.
  • Text handling: to make long texts that are stored in RDF literals more accessible, one should be able to configure what property values are to be indexed. Ideally, version control would only store changes between versions (as opposed to the complete text). As an alternative to storing the text in the RDF repository, one can let the property point to an external document management system. Still, the necessity for version control remains.
  • Record the author of a statement: so that a social application can track who contributed what.
Less important features:
  • Support for XML literals in SPARQL
  • Ease of use: should be easy to install and use; should focus on core RDF repository features.
Three RDF engines come close:
  • Open Anzo: an RDF engine that supports versioning, user-based authentication, and text indexing. Replication is possible, but not in a distributed manner. Open Anzo’s philosophy is very much in line with this post.
  • IBM Semantic Layered Research Platform: does not seem to be updated any more. Poorly documented. I'm not sure if it can do distributed synchronization. Update: This is Open Anzo's precursor (see comments below).
  • OpenLink data spaces: powerful, offers all kinds of import and export services. But the free version does not have replication. I'm not sure how far beyond two-way replication its features go.
  • KiWi (Knowledge in a wiki): an intriguing social content platform that rolls its own RDF engine. Its content model deviates from pure RDF. It also cannot do distributed synchronization. Not publically available, yet.
  • Sesame has a new project called AliBaba that provides repository federation and change logging.
Related technologies:
  • Changesets: an RDF vocabulary for keeping a history of changes. Useful for exporting data from a repository that supports versioning.


Detexify: draw to find a LaTeX symbol

LaTeX has to many symbols that finding the right one is difficult. If you cannot put the symbol into words, you cannot Google for it. A while ago, I had an idea for quick lookup: record what typable character a symbol looks like and search for the symbol via the character. Recently, I've stumbled across a solution that is much neater than this: Detexify lets you draw something and then does a search for symbols that look similar. Cool and useful.


Emacs org-mode: notes, planning, authoring – in plain text

Emacs contains all kinds of craziness, but this is one of the coolest things I've seen in a long time: org-mode is an Emacs mode that does outlining, planning and authoring, all in plain text. A few of the highlights:
  • Everything is text. This does have disadvantages, but the advantages are obvious: easily exchanged by email, put into version control, a format many applications understand, etc.
  • Outlining
  • Hyperlinks to emails, web pages, files, etc. inside content
  • Meta-data such as tags and dates attached to content
  • Tables: continuously aligns all the cells of a column and can move columns around.
  • Spreadsheets: you can perform calculations in tables
  • Publishing: content can be published as HTML and LaTeX
Find more information here:


CSS layout: soon good enough for GUIs

Update 2011-01-24: The ultimate CSS layout spec for webapps

One of the sore points of Ajax has always been layout: CSS is great for document layout, but sucks at GUI layout. Where Java layout managers such as the grid-based JGoodies FormLayout (PDF) work really well and can do things such as “make this grid cell as narrow as the content allows”, CSS currently needs absolute length specifications. Furthermore, letting an element fill out all of the available space is really tricky, especially when done vertically. Thus it is no wonder that layout was ranked high on the OpenAjax browser wish list. Thankfully, the W3C is hard at work at fixing this problem and it seems that all the major browser vendors are on board (if you look at the editors).
According to sources of mine, the W3C will try to consolidate these three proposals as much as possible, but might have to keep two, because GUI applications and document layout have such different requirements.


Eclipse E4 is going even further in the web direction

The latest version of E4 contains “web components”. It warms my heart that they are using Dojo (=true client-side technology) for some of their experiments. JavaScript OSGi modules also look cool (don’t get too excited, though: they are for server-side JavaScript). Update 2009-08-04: Check out SWT Browser edition. Its goal is to enable cross-compilation of SWT applications to something that runs in the browser. Flex cross-compilation is working, other approaches are under investigation.


I've found a use for microblogging (Twitter etc.)

I'm still not sure I entirely understand all the fuss about Twitter. Microblogging feels like an exhibitionist version of cell phone texting: “Look everyone, I'm eating something yummy right now”. It occupies a curious niche between instant messaging, chat rooms, and blogging. But, looking for a good way of blogging links, I've found that it fits my needs perfectly:
  • New entries are simple to create. For link blogs, one only needs a short description and a link.
  • By exporting to RSS or Atom, I can include the link blog in the side bar of my main blog.
I initially tried Twitter, but it turns URLs into micro-URLs such as http://bit.ly/zd5p0. By introducing another level of indirection, linking becomes a lot more brittle: Should bit.ly ever shut down, all my links would be lost. Thus, I looked at Plurk, but finally decided on FriendFeed, because it does not have a limit on how long one's text can be.


Facetator: simple data publishing (with JSON and jQuery)

Facetator is an open source tool for publishing lists such as recipes, journal entries, favorite movies. With Facetator, this kind of data is easily browsable (via faceted navigation) and searchable. Because it is a client-only JavaScript application, you don't even need a special server to use it. A demo is online. The JSON data needed by Facetator can be produced programmatically or manually via Hyena.

Mac OS X tip: upgrade to the latest version of X11

Apple’s XQuartz is always a few version numbers ahead of the standard Mac OS X version. It pays to upgrade, as the latest version offers more (directly configurable) preferences, mainly for clipboard handling and windowing behavior (clicks etc.).

Video: Requirements for web applications and GWT

The video “Google Wave: Powered by Google Web Toolkit” is a great overview of what features web applications need and where GWT currently meets the requirements or soon will.

Earth is one big ant colony

The article “Ant mega-colony takes over world” describes how humans helped create one large pan-global ant colony whose scale of world domination is similar to humanity’s. [Source: doggdot.us]


Do you need a software license?

Software licenses are necessary evils in this modern law-driven world. There are two instances where you need to be at least aware of them:
  • As a content producer: You have created a web page or a program and want to make sure that people use it the way you want them to.
  • As a content user: You need a picture for your web site and are not sure where to find one that you are allowed to put online.
As a content producer, you can choose from an abundance of licenses. They even have names that sound partially familiar: Eclipse Public License, Apache License, BSD License, … But the licenses that are best explained for mere mortals are the Creative Commons Licenses: The summary looks as follows: A separate page then explains what the icons mean. The site provides several ways of including a license: as a button for web pages, as a text file for file archives, etc. You can also choose your license in a wizard-style manner. An important idea in software licensing is copyleft (the reversed C icon above): You don't exert any copyright over your work, but force derived work to do the same. The latter requirement has led to Microsoft calling copyleft “viral” (a term that has since lost its negative connotation). So how about you wanting to use a picture? There, Creative Commons (CC) helps you, too: If you do an advanced search on Flickr, you can choose the CC license a picture should have. The same holds for the search options of Picasa Web Albums (when the results are displayed). I usually link back to the picture page underneath the picture or make the picture itself a hyperlink. Wikipedia pictures also carry a CC license. Lastly, the CC site has a search page to search for liberally licensed content.


Beware of Google Groups

I love the Google Groups user interface: It is easy to understand and you can get started right away (quicker than with all alternatives that I’ve seen so far). It is a great tool whenever teams need to collaborate, because it gives members a choice between RSS, email, and online reading. I’ve used it often when working with groups of students.

Alas, it also shows Google’s modus operandi of running things on auto-pilot. Almost a year ago, out of the blue, Google Groups banned me from their site. Some kind of automatic trigger went off; I was never given a reason. Now, on one hand, I cannot manage my own groups. On the other hand I cannot access any information stored anywhere in Google Groups. If I’m logged in, that is. Thus, I now use Firefox for browsing the web (where I’m logged into Blogger, Google Calendar, etc.) and Safari (where I’m not logged into any Google web apps) whenever I need to look up information that is stored in Google Groups.

I did try many things to fix this: I sent an email to the Google accounts help desk, posted in the Google Groups help groups (well, I sent it to my brother who posted it for me) where this is a frequent topic of discussion, even sent a fax to Google headquarters. All to no avail.

So Google does share a trait that is common in big companies: it does not listen to its customers. Or only when it suits them. Granted, they are popular, have a lot of customers, and are free, but this is still very frustrating. Losing control over one’s own data makes “free” services very expensive.

Alternatives to Google Groups that I know of:
Update 2010-12-23: John Resig (creator of jQuery, among other things) writes about why he abandoned Google Groups in “Google Groups is Dead”.


Yes, you can do echolocation, too

Amazing: aparently, humans can learn to “see” their environment like bats or dolphins, by making clicking sounds with their tongue.


What is the appeal of Ajax and GWT?

Ajax does have its detractors. Their argument goes as follows: Why reinvent everything that has already been done on the desktop on an inferior platform? I do agree that the attraction of Ajax is subjective (i.e., not based on technological arguments). This is obvious whenever I’m excited about something web-based, show it to non-developer friends and their only reaction is boredom. Then I realize that while I’m excited about what’s possible on the web, they have already seen it on the desktop. But—there are some good arguments in favor of Ajax. My reasoning goes as follows:
  • I love web applications (because I use 3 different computers having data travel with me is great).
  • I’ve always disliked Applets and Flash. With advanced browser use (tabs, drag&drop of links, etc.), anything that is not well integrated feels constricting.
  • Mobile applications: Web applications are currently the best solution if you need something that runs on the smartphone platforms Android, iPhone, Palm Pre, and Blackberry. The browsers of all of these platforms are WebKit-based, making testing less of a chore. Windows Mobile 6 is out there, too, but feels dated now, and I'm not sure how capable its browser is.
  • There is tremendous momentum behind the browser as a platform. New user interface ideas are constantly being tried out, JavaScript is getting really fast, gains lots of APIs (geolocation comes to mind), etc.
Using GWT to write Ajax applications has the following advantages:
  • Compared to desktop Java: GWT makes programming web applications almost as simple (in some cases simpler) as programming Swing. So why not use it?
  • Compared to other Ajax solutions: GWT has Java's superior tooling, one has a single code base for client and server, and GWT’s compiler produces highly optimized code (due to Java’s static nature).
[Further reading: “What should be the platform of your next application?”]