twitterlink facebooklink feedlink

April 10, 2010: This afternoon I recieved a visit from a plushy black cat. I've never seen her before and since she has a collar she may be moved in with her owner in the last days.

Read more about project 365 ...

Everywhere you go, always take your topic with you

Posted by semantosoph on Apr 14, 2010 | 0 comments

Ever since I saw an embedded YouTube-Video, I’m fascinated with the idea of employing my browser for bringing data from different sources together automatically. Of course, the Linking Open Data initiative has made great success in structuring and combining large amounts of data. But that blending always happens on the server side and the user has no influence on what kind or what amount of linked data I may see in my browser.

To achieve a more user friendly experience, the benefits from the field of publicly available data storing should be combined with the idea of the embeddable video. That is, bringing small parts of public open data to the user in an easily embeddable form. And to bring this one step further, an user should not only be able to embed public data, but to create embeddable widgets of his own structured data, too.

Having said this, I’m fairly proud of the widget engine that we implemented into Maiana recently. In a first step, every topic of any publicly available topic map that was uploaded to Maiana offers a ready-made HTML-Code that allows easy embedding into websites and blogs. The code itself is very lightweight and contains only one Javascript call. This call retrieves all necessary data by sending a pre-defined TMQL query to Maiana. The resulting box looks as if you had cut it right out of the Maiana website.

But this is only the first step. To demonstrate the possibilities of this widget engine, we implemented Yacca. Yacca is a mashup that combines data of all the teams (and their players) that will appear at the FIFA soccer world championship in South Africa with data from the Twitter search API and offers small widgets, containig these two snippets. Of course, the soccer data is stored in a Maiana topic map. In this way, every soccer fan is able to show its support for a favourite player with a widget like the following one. And to get this even more playful, all the Javascript calls are counted to create a definitive list of fame for the world cup players.

Now, think of all the possibilities that spark up with a widget engine like this. You can upload whatever topic map you like and have your topics shared all over the world. We, the Maiana developer team, are really looking forward to see your ideas and applications for this.

Dude, where's my context?

Posted by semantosoph on Mar 19, 2010 | 1 comment

Patrick Durusau recently brought the idea of using context information for a better identification of subjects (subjet in its topic mappish meaning) in textual search to my mind. The background is that (verbal) context influences the way we understand expressions. Let’s have an example here: The word bank has many different meanings. It could be the monetary institution, a riverbank or just seating furniture. When used out of context (i.e. the original document), you can not determine which meaning is the correct one. But with the knowledge of other expressions from this document, you can. E.g., when the words money and fraud appear in the document, bank must have the meaning of monetary institution.

This additional knowledge can be used to identify subjects. Not in the meaning of an URI, but in the meaning of co-occurrences and their linguistic sense that can be interpreted as an indicator of semantic proximity.

1. The Ontology

The schema of this topic map is plain and simple. There is one topic-type, called word. All of its instances are connected by associations of type co-occurrence_of. Additionally, each association is reified by a topic of type count that counts how often these two words have been found in co-occurrence.

2. Indexing

Indexing the co-occurrences of texts is no rocket science. It is usually done by parsing the text sentence by sentence, removing all stop words, and counting the co-occurrence of each pair of the remaining words in a large SQL database table.

With the afore-mentioned topic map, the procedure would be similar, except for the last step. Instead of updating a database table, topics are created for any new pair of words. Afterwards, an association between these two topics is spawned, along with its reifying topic. Any pair of words that is found again, triggers nothing more than an incrementation of the reifiers count-occurrence.

3. Searching

This is the part, where the user comes in. To give a good example, our hero may search his or her document collection for the term aspirin (perhaps he had a hard night). Actually, he wants to find all documents that may help him getting over his headaches, not only the ones that deal with aspirin directly. Luckily for him, his search enginge does not only look in its fulltext index, but into the topic map of context, too. This works as follows:

  • For every searched word (here: aspirin)
    • Find the nearest neighbors (i.e. the n terms with the highest count of appeared co-occurences)
    • Perform a fulltext search for each of the newfound words
  • Rank the documents
    • Top – documents that contain the original term and some of the neighbors
    • Middle – documents that contain only the original term
    • Bottom – documents that contain only neighbors

In this way, the user will not only find the documents that contain the word aspirin, furthermore he will find documents that deal with the subject of aspirin without naming it directly.

4. Conclusions

You may have noticed that this last step is the opposite thing of word sense disambiguation. The searched terms are not narrowed to increase the precision, instead they are broadened to increase the recall. However, this change in the direction of thinking brings up an easy and rather cheap possibility to advance from keyword search to subject centric search.

Inference on the Semantic Web

Posted by semantosoph on Nov 05, 2009 | 0 comments

This is a very informative slideshare on inference and the power of the Semantic Web. To emphasize the potential of the Semantic Web, Myungjin Lee used RDF and OWL constructs to represent the statements of his slides. Very impressive, though.

Filed under , , ,

HTML5 and Semantics

Posted by semantosoph on Nov 03, 2009 | 0 comments

If you ever did a web project that featured dated entries, like articles in a blog, you may have typed something like

<div id="navigation">
  <ul>
    <li>Home</li>
    <li>Link 1</li>
    <li>Link 2</li>
    <li>Link 3</li>
  </ul>
</div>

and

<span class="publication-date">2009-11-03</span>

Now, this is higly understandable for any developer and for most of your users, but not for crawlers and the like. As this is the central issue in the field of the semantic web, there are already some solutions that address this problem. You could, for example, enhance your code with semantic markup like RDFa or microformats.

But then, that may be overkill. Why not just use the possibilities that come with the shiny new HTML5? Most of the modern browsers already support at least some of the new features. Features, in this case, are new text-level semantic tags that allow a lightweight semantic annotation of content. With the use of these new tags, our first example would read:

<nav>
  <ul>
    <li>Home</li>
    <li>Link 1</li>
    <li>Link 2</li>
    <li>Link 3</li>
  </ul>
</nav>

This enables screen readers, mostly used by people with limited sight, to jump directly to the sites menu and allow a quick navigation. Even more fun comes with the second example:

<time datetime="2009-11-03" pubdate>2009-11-03</time>

This promotes the human readable string to a full machine readable timestamp that tells every crawler/browser/whatever the publication date of the embracing document.

If you want to learn more about these two and all the other cool new tags, please have a look at Dive into HTML5, a clearly structured and neatly designed publication by the great Mark Pilgrim.

Filed under ,

Martian Notation for Topic Maps

Posted by semantosoph on Nov 02, 2009 | 0 comments

Many people tend to understand structures much better when they are properly visualized. So it comes to no surprise that there were some proposals for a Graphical Topic Map notation (GTM) made in the past. The guys from musicDNA recently stepped into the spotlight with their latest publication that illustrates Topic Maps by using a Martian as example. Unsurprisingly, they call it Topic Map Martian Notation.