tmjs 0.4.0 released

This is a short note to let you know that I've released version 0.4.0 of my JavaScript Topic Maps engine tmjs today. It contains lots of bugfixes, CXTM tests, and, finally, JTM 1.1 support. Here is the complete changelog:

Changes

  • JTM 1.1 support. Lots of bugfixes in JTM 1.0 reader/writer
  • CXTM export
  • Implemented all CXTM tests for JTM 1.0 and JTM 1.1
  • Added more documentation
  • Fixed indentation for JSLint
  • Added export of TM.version
  • Added support for TMAPI-function chaining
  • Added locator for default name type
  • Changed the public duplicate removal API
  • Added getLocator method for TopicMap (by Daniel Exner)
  • Removed check for types in getNames (by Daniel Exner)

Minor changes:

  • Removed debug info
  • Unit test overview page renamed to index.html
  • Removed unused variable declarations

Special thanks go to Daniel Exner who is the first person that contributed to tmjs. Congratulations! More contributions are of course greatly appreciated. Don't hesitate to contact me if you'd like to contribute. There are lots of things to do!

Also, I'd like to thank the creators of the CXTM test suite. Without it, I wouldn't have been able to find most of the bugs that have been fixed for this release.

Publishing subject identifiers with node

Introduction

At the Topic Maps Research and Applications Conference in Leipzig this year, I had the possibility to talk about JavaScript Topic Maps in server environments. As a use case, I presented a server for Published Subject Identifers (PSIs). In this blog post, I'd like to give you a small introduction about how to use the server, as I've finally released the sources of the server at GitHub. It's still pretty unstable and lots of features are still missing, but hey, now it is YOUR chance to become an early adaptor!

Overview

node-psi-server is a server application written in JavaScript. It's Open Source and the software has been released under the MIT license. The basic idea is that you can take any topic map file and serve all its subject identifiers for a given servername.

node-psi-server can serve subject identifiers from any topic map that fit into memory. It supports logging, caching and templates for the generated pages, including export in HTML5 and JTM. The server can be configured using a simple configuration topic map where you can set default values for publisher, creation date and publishing status (experimental, stable, etc.) as well as occurrence and association types that you want to export for your topics. This way, you can include e.g. dc:description occurrences or type-instance associations all generated pages--if available for the topic in question.

Installation

node-psi-server relies on nodejs (Installation instructions) and the Connect framework (Installation instructions). The other dependencies, including the Topic Maps engine tmjs are included in the distribution. You can use git to download the sources directly from GitHub:

git clone http://github.com/jansc/node-psi-server.git

Alternatively, you can download a compressed archive with the sources by clicking on the blue "Downloads" button on the same page. That's it. If you're really impatient, take a look at the README file, otherwise, continue to configure your server:

Configuration

The distribution includes a JTM version of the Opera topic map which can be used for testing your installation. Before starting to configure your PSI server, you might be interested in listing all PSIs contained in your topic map. Enter the main folder of node-psi-server and type in the following command at the command line prompt:

node server.js --list examples/opera.jtm psi.ontopia.net

This will list all PSIs from the Opera topic map with a server name psi.ontopia.net. Note that you always start the server with node server.js followed by some options and two required parameters: the topic map filename and the servername for which you want to serve PSI pages. To get more detailed information about available options, type:

node server.js --help

Now, let's start the PSI server itself:

node server.js --config ./config-sample.jtm examples/opera.jtm psi.ontopia.net

Due to a limitation in tmjs, right now only JTM topic maps are supported. Therefore, the configuration has to be supplied as JTM as well. To make things a bit easier, a sample configuration file in CTM format is included in the distribution. You can find a syntax-hightlighted version of this file at paste.mappify.org. You can adjust this file to your needs and convert it to JTM using Mappify's tm2tm converter. The options for default values, port number and debug level should be pretty self-explanatory. In addition to that, you can list occurrence and association types that you want to export. If you e.g. want to include a Dublin Core description occurrence for all topics with a description, add the following to you configuration (in CTM syntax):

nps:type-is-published(nps:type : dc:description)

If you want to include all associations of a given type, simply supply the PSI of an association type:

nps:type-is-published(nps:type : <http://psi.topicmaps.org/iso13250/model/type-instance>)

After you've started the server, something like the following output should appear:

25 Oct 21:42:40 - Reading configuration file './config-sample.jtm'.
25 Oct 21:42:42 - Imported tm in 1.749secs
25 Oct 21:42:42 - Occurrence types and association types to include on PSI pages:
25 Oct 21:42:42 -  * http://psi.topicmaps.org/iso13250/model/type-instance
25 Oct 21:42:42 -  * http://purl.org/dec/elements/1.1/description
25 Oct 21:42:42 - Prefetching export types:
25 Oct 21:42:42 -  * found http://psi.topicmaps.org/iso13250/model/type-instance
25 Oct 21:42:42 - Server running at http://127.0.0.1:8000/

You should now be able to open the following pages in you web browser: http://localhost:8000/person or http://localhost:8000/city/380-firenze.

Here are some screenshots of the resulting pages:

Screenshot of the person PSI page
Screenshot of the Florence PSI page

Outlook

Probably the biggest limitation right now is node-psi-server only supports JTM topic maps. Another known problem is, that PSIs containing anchors (that is PSIs with URLs with a #) can't be served right now. This is because the browser does not send the anchor part of URLs to the server. In addition to that, there is a long list of wanted features: subj3ct.com integration, PSI stop lists, accept-header parsing, more export formats, etc. And lots of bug fixes! Your feedback is very welcome :-)

You might also be interested in the slides from my presentation at the TMRA:

A proposal for JTM 1.1

In this post I'd like to propose two small changes to the JTM notation. It's not that there are not enough Topic Maps exchange formats available. Implementers can already today write import/export modules for their Topic Maps engine both day and night, instead of spending their valuable time writing TMQL lexers and parsers.

But in the past months I've been working with Topic Maps on mobile devices, more or less as a pet project. Not surprisingly, one of the challenges with mobile plaforms is handling memory constraints. The app I'm creating imports a JTM file, and during the import I have to keep the entire JTM file and the topic map itself in memory. It turned out that it would be really nice to reduce the size of the JTM file without too many changes to the JTM syntax.

Therefore I'd like to propose two changes to the JTM:

  1. a shortcut for type-instance associations in form of an instance_of-array
  2. prefixes for locators

A shortcut for type-instance associations

type-instance associations are common in Topic Maps, but their serialization is quite verbose. I woud therefore like to add a new member to the topic object: instance_of. instance_of is an array of topic references. Example:

{"version":"1.1",
 "item_type":"topic",
 "subject_identifiers":["http://psi.topincs.com/people/thomas-vinterberg"],
 "instance_of": [
   "si:http://psi.semanticheadache.com/person"
 ],
 "names":[
    {"value":"Thomas Vinterberg",
     "type":"si:http://psi.topicmaps.org/iso13250/model/topic-name"}]}

Alternatively, type-instance associations may still be exported associations. So old files are still compatible.

Prefixes

There is a lot of redundancy in locator strings. Adding prefixes similar to CTM and LTM would make JTM files more readable (and also easier to write). I suggest adding an optional prefix member to the document. The corresponding value is an object with the prefixes as its keys and a reference as its value:

"prefixes":{
  "dc": "http://purl.org/dc/elements/1.1/",
  "dcterms": "http://purl.org/dc/terms/" 
}

So much on defining prefixes. Now on to their use: item_identifiers (in all topic map items), subject_identifiers and subject_locators can hold an IRI or a Safe_CURIE. A Safe_CURIE is a CURIE wrapped in '[' and ']'. A CURIE consists of a prefix and a reference. In topic references, an IRI or a Safe_CURIE may appear after 'si:', 'sl:' and 'ii:'. Safe_CURIEs are also allowed in datatype members of occurrences and variants. Example:

{"version":"1.1",
 "prefixes": {
   "topincs": "http://psi.topincs.com/people/",
   "tmdm": "http://psi.topicmaps.org/iso13250/model/"
 },
 "item_type":"topic",
 "subject_identifiers":["[topincs:thomas-vinterberg]"],
 "names":[
    {"value":"Thomas Vinterberg",
     "type":"si:[tmdm:topic-name]"}]}

CURIEs or Compact URIs are defined in http://www.w3.org/TR/curie/. CURIEs are a more relaxed version of QNames which are often used to model prefixes for IRIs. With a prefix "t": "http://psi.topincs.com/", "t:movies/dear-wendy" would be a valid CURIE, but not a valid QName. I propose Safe_CURIEs for JTM 1.1, because they explicitly define whether to interpret a locator as an IRI or a CURIE. This avoids edge cases where an IRI gets expanded when a prefix with the same name of the scheme is defined (Think of a prefix http). It also makes parsing easier for both humans and machines.

Comparison of JTM 1.0 and 1.1

The third change is quite obvious: All JTM 1.1 documents must have a member version with the value "1.1". To illustrate the changes, let's compare the topic map from the JTM 1.0 document to its JTM 1.1 equivalent:

{"version":"1.0",
 "item_type":"topicmap",
 "topics":[
    {"subject_identifiers":["http://psi.topincs.com/movies/dear-wendy"],
     "names":[
        {"value":"Dear Wendy",
         "type":"si:http://psi.topincs.com/title",
         "scope":[
            "si:http://www.topicmaps.org/xtm/1.0/country.xtm#US",
            "si:http://www.topicmaps.org/xtm/1.0/country.xtm#DE"]}],
     "occurrences":[
        {"value":"2005",
         "type":"si:http://psi.topincs.com/publication-year",
         "datatype":"http://www.w3.org/2001/XMLSchema#gYear"}]}],
 "associations":[
    {"type":"si:http://psi.topicmaps.org/iso13250/model/type-instance",
     "roles":[
        {"player":"si:http://psi.topincs.com/movies/dear-wendy",
         "type":"si:http://psi.topicmaps.org/iso13250/model/instance"},
        {"player":"si:http://psi.topincs.com/movie",
         "type":"si:http://psi.topicmaps.org/iso13250/model/type"}]}]}

becomes:

{"version":"1.1",
 "item_type":"topicmap",
 "prefixes": {
   "t": "http://psi.topincs.com/",
   "xtm": "http://www.topicmaps.org/xtm/1.0/" 
 },
 "topics":[
    {"subject_identifiers":["[t:movies/dear-wendy]"],
     "instance_of": ["si:[t:movie]"],
     "names":[
        {"value":"Dear Wendy",
         "type":"si:http://psi.topincs.com/title",
         "scope":[
            "si:[xtm:country.xtm#US]",
            "si:[xtm:country.xtm#DE]"]}],
     "occurrences":[
        {"value":"2005",
         "type":"si:[t:publication-year]",
         "datatype":"http://www.w3.org/2001/XMLSchema#gYear"}]}]}

Every JTM 1.0 file will still be a valid JTM 1.1 file, and changing existing parsers should not be too hard. In my opinion, this will help to make the JTM even more compact, while still maintaining its simplicity.

Special thanks go to Lars Heuer for comments and feedback. What do you think? Comments welcome!

Pragmatic topic map streaming

I started this day quite innocent at 5am in the morning---until the word "sensor data" on linkeddata.deri.ie toggled the Robert Barta-switch in my head, and, after a short disscusion on the #topicmaps IRC channel, a chain reaction started. Here is the relevant core dump:

I've been thinking a lot about knowledge streaming lately. C-SPARQL [PDF] seems very interesting, and I started wondering how to implement some kind of streaming for Topic Maps. The general idea of streaming topic maps and topic map changes is not new. SDShare was presented at the TMRA 2008, NetworkedPlanet had its own update feed long before that; I remember looking closer at this feed in connection with automatic update of the GREP topic map (The Norwegian Curriculum is published as a topic map, in case you haven't heard about this before).

The above mentioned protocols supply a feed of changes to a topic map and can be used to sync these changes with other topic maps. Pragmatics ahead.

I'm going to concentrate on a subset of this problem, the problem of knowledge aggregation. This means that I'm not directly interested in changes to a topic map, only newly added topic map constructs. Many popular services like twitter, facebook, delicious, flickr provide APIs which again provide feeds of newly added information. I use a twitter client, an RSS feed reader, and other applications. Additionally, I like to store snippets of intersting web pages (I currently use DevonThink for this task). I'm trying to keep up with all those feeds on a daily basis, and this works mostly fine---until, sometimes months later, I try to find that information again. Have you once tried to find a tweed that you saw a couple of weeks ago?

The solution is obvious: Topic Maps can help me to aggregate that information, and, hopefully, make it easy to find it when I need it most.

Here is the idea: I store all information from the social services I use in a personal topic map (short: TM). To build this topic map, I run a client ("topic map stream reader") that periodically checks all topic map stream feeds. If new items appear, it fetches the relevant topic map and merges it into my TM. For each service that I'm interested in, I create a simple wrapper that provides me with an ATOM feed. Each item of that feed is, you guessed it, a topic map. The trick is, that the items are topic maps, not topic map fragments. This allows me to make use of the most powerful and most frightening feature of ISO 13250-2: merging.

Note that I don't say anything about the complexity of the generated topic maps. They can contain everything between only one association or a topic stub with just one occurrence and a more complex topic map with serveral topics and associations.

tm-syndication.png

Let's take a tweet as an example. I can easily create a service that creates a topic map for a tweet. It could have a person or twitter account topic, maybe some associations for tweet-mentions-account, tweet-mentions-hashtag, some meta data such as the posting date and a subject locator to the tweet itself. Such a service would be easy to set up on a Google App Engine account. Then I can create a feed of all tweets of the people I follow with the twitter API, and this feed can be converted to an ATOM feed. Maybe it would even be easy to create a Google App Engine application that generates such a personalized feed for me, but I'm not sure if I would publish this feed (but that's a different story). By iterating over the most popular services, the Topic Maps community could provide small even dynamically generated topic maps for all kinds of information pieces. It should e.g. be easy to convert an ATOM feed of a blog into an ATOM feed of topic maps that describe or contain the blog entries.

What is missing now, is the ability to read and combine those ATOM feeds. It doesn't sound hard to write a little topic map stream reader that uses my favourite Topic Maps engine TME, reads all feeds F that I'm interested in, for each feed item i fetches the topic map Ti, and merges that topic map into my personal knowledge base topic map. Et voilĂ : Topic Maps streaming in action!

What do you think? Would that be useful? It seems to me that such an aggregated topic map would be useful for integrating the social services that I use, and to store other personal information. The quality of the information depends a lot on how the different services are mapped to topic maps. However, it should be possible to find what you're looking for with some custom TMQL queries. Also, there is no limit of the services that can be wrapped into such topic map streaming feeds. It can be blog feeds, a feed of topic maps in Maiana, photos from flickr. You get it. A side effect of such streaming wrappers would be that many small topic maps become available on the web. I'm sure that there are many was to link them together!

That's it for now. I hope that you could get a basic understanding of my idea. I'll try to put up an example of a topic map streaming feed in one of the next posts.

top