Wednesday, November 11, 2009

Accessing Ontopia from PHP

At the Ontopia code camp at the TMRA 2009 Lars Marius talked about integration with Content Management Systems (CMS). Many CMSes available on the marked are written in PHP, so the question of how to integrate Ontopia which is written in Java with existing CMSes in PHP came up. This blog post shows one solution to the problem.

A while ago I saw that there is a PHP/Java bridge available. The bridge uses a streaming, XML-based network protocol, which can be used to connect a PHP (or some other scripting engine) to a Java virtual machine. As a result, you are able to call Java procedures from PHP or PHP procedures from Java. The good news is that you don't even need a PHP extension to access Java. I thought that it would be fun to run tolog queries from PHP, and two hours later I managed to do this. This is how:

The PHP/Java bridge consists of two parts: A small include file written in PHP and some .jar-files that you have to put into your Tomcat lib directory. For testing purposes I did not even set up a Tomcat server, but started the bridge in the included servlet mode directly from the command line. Keep in mind that this was only an intial test, so I tried to take the shortest path to get things running. Use at your one risk!

I downloaded Ontopia 5.0.2 and extracted it into my home directory. Then I created a Postgresql database, wrote a database configuration file (see the ontopia docs) and imported my beetle topicmap. I downloaded the PHP/Java bridge, and put the following .jar-files into the lib-directory:

  • JavaBridge.jar
  • php-script.jar
  • php-servlet.jar

From the lib/ directory, I started the java bridge in servlet mode:

java -jar JavaBridge.jar SERVLET_LOCAL:8080

The result was a servlet listening on port 8080 on my local machine. Now for the PHP-part: I created a new directory and copied the Java.inc file from the PHP/Java bridge distribution into it. Then I wrote the following PHP code and saved it as test.php:

<?php
// Debugging is good for you
define ("JAVA_DEBUG", true);
// Tell PHP where and how to connect to Java
define ("JAVA_HOSTS", "127.0.0.1:8080");
define ("JAVA_PIPE_DIR", null);

require_once("Java.inc");

try {
    java_require('tmapi.jar;ontopia.jar;postgresql.jar');
    $factory = java("org.tmapi.core.TopicMapSystemFactory")->newInstance();

    // Tell TMAPI to use the postgresql database backend and my beetle topic map
    $factory->setProperty("net.ontopia.topicmaps.store", "rdbms");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.Database", "postgresql");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.ConnectionString",
        "jdbc:postgresql://localhost/tm_coleoptera");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.DriverClass", "org.postgresql.Driver");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.UserName", "jans");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.Password", "");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.ConnectionPool", "false");
    
    $sys = $factory->newTopicMapSystem();

    // Finally, we get a TopicMap-object
    $tm = $sys->getTopicMap('file:/Users/jans/labben/catcol2xtm/catcol.xtm');
    
    $topic = $tm->getTopicBySubjectIdentifier(
            $tm->createLocator("http://psi.entomologi.org/genus/carabus"));
    
    $name = $topic->getNames()->iterator()->next();

    // Print the name of the Topic that we fetched
    printf("Topic name is '%s'\n", $name->getValue());
    
} catch (JavaException $ex) {
    // Sometimes things go wrong...
    echo "An exception occured: $ex\n";
}
?> 

The first lines define some constants to tell the PHP-part of the bridge how to access the Java servlet. Then, the client implementation of the communication protocol is included. By calling java_require(), I can tell the bridge where to look for the Java classes that I want to access from PHP. In this case, we need TMAPI, the Ontopia API and the JDBC-driver for postgresql. The java()-function is used to access Java-objects and returns a PHP-representation of the object. Once we have a PHP representation of a Java object, it's possible to call its methods as if they were PHP methods, e.g. $sys = $factory->newTopicMapSystem();.

To run this code, I just invoked PHP from the command line:

$ php test.php

The previous example showed how to use TMAPI from PHP. This is how to run a Tolog query:

<?php
define ("JAVA_DEBUG", true);
define ("JAVA_HOSTS", "127.0.0.1:8080");
define ("JAVA_PIPE_DIR", null);

require_once("Java.inc");
    
try {
    java_require('tmapi.jar;ontopia.jar;postgresql.jar');
    $factory = java("org.tmapi.core.TopicMapSystemFactory")->newInstance();
    $factory->setProperty("net.ontopia.topicmaps.store", "rdbms");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.Database", "postgresql");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.ConnectionString",
        "jdbc:postgresql://localhost/tm_coleoptera");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.DriverClass", "org.postgresql.Driver");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.UserName", "jans");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.Password", "");
    $factory->setProperty("net.ontopia.topicmaps.impl.rdbms.ConnectionPool", "false");
    
    // Not the most elegant way of getting a TopicMapIF object,
    // guess you shouldn't do that at home.
    $proc = java("net.ontopia.topicmaps.query.utils.QueryUtils")->getQueryProcessor($tm->getWrapped());
    $query = <<<EOS
        using type for i"http://psi.topicmaps.org/iso13250/model/"
        select \$SUBCLASSES from type:supertype-subtype
        (i"http://psi.entomologi.org/genus/carabus" : type:supertype,
         \$SUBCLASSES : type:subtype)?';
EOS;        
    $result = $proc->execute($query);
    $str = java('net.ontopia.topicmaps.utils.TopicStringifiers')->getDefaultStringifier();
    while(java_values($result->next())) {
        $row = $result->getValues();
        $arr = java_values($row);
        foreach($arr as $k => $v) {
            $val = java_values($v);
            $topicname = $str->toString($val);
            print("$topicname\n");
        }
    }
    $result->close();
    
} catch (JavaException $ex) {
    echo "An exception occured: $ex\n";
}       
        
?>

I created an instance of a QueryProcessor object and run a query using wrapped Java objects. A new problem that arises here is how to convert a Java-object into a PHP value: In the while()-loop, $result->next() does not return a boolean, as you would except, but a PHP-representation of a Java boolean. The java_values() function does the dirty work and converts a Java-representation into a PHP value, which is a PHP boolean in this case. This function is also able to convert Java arrays into PHP arrays, as I did here: $arr = java_values($row);, where $row is a Java array and $arr is a PHP array. The elements of the resulting PHP arrays are still wrapped Java objects, so you have to call java_values() for these as well.

This is basically it. In a real world example, we would not use the provided container to run the bridge servlet, but something like Tomcat. See the documentation of the PHP/Java bridge for details. Disclaimer: This blog post has been written while I tried to keep up with Lars Marius' talk, so it might be complete nonsense. Multitasking is hard and might lead to unexpected results in some cases.

Ping

This blog is still alive. Yesterday, I arrived in Leipzig to attend the TMRA 2009 conference, and plan to do some blogging from the conference.

Friday, April 17, 2009

Usability Friday - Install Install Parallels Desktop

After some problems with Parallels Desktop 3.x I finally decided to upgrade to Parallels 4. Seems like they've changed their license policy:

Besides the weird license the upgrade went smoothly. First I thought the image was corrupted, but everything else seems to work fine. Don't know what that means...

Thursday, March 19, 2009

Topic Maps 2009 conference in Oslo

I've spent the last two days at the Topic Maps 2009 conference in Oslo. The first day was a tutorial day. It was hard to decide between the Wandora-tutorial and the ZTM Topic Maps tutorial, but since I'd heard a lot about ZTM, but never had the time to give it a try, I decided to attend this full-day tutorial.

In contrast to many other tutorials that I attended at conferences, the goal was to get the ZTM platform up and running for all attendees - instead of limiting the tutorial to a presentation of the ZTM features.

It was quite time-consuming to install and configure Python and Zope on all machines, but after some hours we were able to import topic map into the system and create some topics and topic types manually in the administration interface. Arnar rounded up with a short introduction about how the template mechanism works. Once have the basic packages installed on your system, it should only take a couple of minutes to get a new site up and running - assumed that you have done this before a couple of times. ZTM Topic Maps currently only supports XTM 2.

Fortunately, I had a topic map in XTM 2 format at hand that I had created with the help of a Perl script from an Excel sheet some time ago. The topic map is a catalogue of all fennoscandinavian beetles with their scientific names (and PSI based on these), synonyms, redlist status, and the countries in which they appear. Additionally, the taxonomy includes suborders, families, subfamilies, genera, etc. For me, it was the first time that I look at the generated topic map. The Perl script still needs some tuning and some changes to the ontology, but in its current version it has about 8000 topics and 38000 associations. It was not problem to import the map into ZTM besides a little issue with the datatype of occurrences. More on that in a later post.

Today was filled with presentations and talk about various more or less Topic Maps related subjects. I'll only give an overview of some of the presentations I attended:

  • In the keynote, Tommy Nordeng talked about why it is so difficult for people to share infomation. He presented the ambitious NDLA project, which aims at collecting and sharing lots of e-learning resources.
  • Graham Moores talk was titled A Vision for a Topic Maps world. He gave a short summary of how and why Topic Maps is successful at a small and medium-sized (enterprise) scale. The main part of the talk he explained how to make Topic Maps a successful technology at large scale (= the Web). He presented a service for registration and discovery that is going to be launched next month by his company Networked Planet (btw, I had not seen their new web site before).
  • In the next talk, Robert Engels, an active member of the Semantic Web community, presented the SeSam4 project which aims at providing semantic tools, methods and knowledge for owners of information systems. The objectives of the SeSam4 project include to develop open standards and tools, to verify semi-automatically generated ontologies, integration of content management systems, to improve methods for text mining and indexing, and to develop methods, models and procedures for semi-automatic generation of ontologies. One of the challenges that the project tries to solve is genuine interaction and communication between content creators, owners and consumers. He also talked a bit about use cases that are part of SeSam4: Tourism and construction. On the Topic Map side of the project plans are to choose either Wandora, tinyTim or TM++ as an underlying implementation, which is going to be decided soon.
  • After a short break, Torstein Thorsen talked about FAST. Besides the title and a short comment at the end of the presentation, he did not mention Topic Maps. Instead, the talk was more a presentation of the different possibilities of how to implement search and faceted search with the FAST engine. At the end of the presentation, he showed us a prototype of a new search interface which was based on a drag and drop interface that lets the user search for songs in a music collection.
  • After the lunch break, I went to see Jørgen Dalen talking about a Sharepoint and TMCore-based project. He claimed that 80% of the knowledge of a company only exists in the head of its employees, so a solution is needed to find people with the right competences in a large company.
  • Bodil Kjelstrub from the University of Bergen spoke about the decisions behind the university's new web site that is based on Topic Maps and implemented with ZTM Topic Maps. She presented some of the difficulties that arise when you try to create a solution for such a large institution.
  • Stian Danenbarger introduced the concept of "modelldrevet søk" (model-driven search). He presented several ways to allow users to forfill their information needs. One of the examples was Freebase Parallax, a research prototype that uses the Freebase dataset. Another of the interfaces that he presented was the swedish site silobreaker, which offers search in a huge number of news, blog, research and multimedia sources.

There is a lot more to tell from the conference, and this was only a short summary. I really enjoyed the conference, even more than the two years before. Maybe I'll have time to write more later. Back to work.