Semantics, Code and Circumstance
April 10, 2010: This afternoon I recieved a visit from a plushy black cat. I've never seen her before and since she has a collar she may be moved in with her owner in the last days.
Read more about project 365 ...Ever since I saw an embedded YouTube-Video, I’m fascinated with the idea of employing my browser for bringing data from different sources together automatically. Of course, the Linking Open Data initiative has made great success in structuring and combining large amounts of data. But that blending always happens on the server side and the user has no influence on what kind or what amount of linked data I may see in my browser.
To achieve a more user friendly experience, the benefits from the field of publicly available data storing should be combined with the idea of the embeddable video. That is, bringing small parts of public open data to the user in an easily embeddable form. And to bring this one step further, an user should not only be able to embed public data, but to create embeddable widgets of his own structured data, too.
Having said this, I’m fairly proud of the widget engine that we implemented into Maiana recently. In a first step, every topic of any publicly available topic map that was uploaded to Maiana offers a ready-made HTML-Code that allows easy embedding into websites and blogs. The code itself is very lightweight and contains only one Javascript call. This call retrieves all necessary data by sending a pre-defined TMQL query to Maiana. The resulting box looks as if you had cut it right out of the Maiana website.
But this is only the first step. To demonstrate the possibilities of this widget engine, we implemented Yacca. Yacca is a mashup that combines data of all the teams (and their players) that will appear at the FIFA soccer world championship in South Africa with data from the Twitter search API and offers small widgets, containig these two snippets. Of course, the soccer data is stored in a Maiana topic map. In this way, every soccer fan is able to show its support for a favourite player with a widget like the following one. And to get this even more playful, all the Javascript calls are counted to create a definitive list of fame for the world cup players.
Now, think of all the possibilities that spark up with a widget engine like this. You can upload whatever topic map you like and have your topics shared all over the world. We, the Maiana developer team, are really looking forward to see your ideas and applications for this.
Patrick Durusau recently brought the idea of using context information for a better identification of subjects (subjet in its topic mappish meaning) in textual search to my mind. The background is that (verbal) context influences the way we understand expressions. Let’s have an example here: The word bank has many different meanings. It could be the monetary institution, a riverbank or just seating furniture. When used out of context (i.e. the original document), you can not determine which meaning is the correct one. But with the knowledge of other expressions from this document, you can. E.g., when the words money and fraud appear in the document, bank must have the meaning of monetary institution.
This additional knowledge can be used to identify subjects. Not in the meaning of an URI, but in the meaning of co-occurrences and their linguistic sense that can be interpreted as an indicator of semantic proximity.
1. The Ontology
The schema of this topic map is plain and simple. There is one topic-type, called word. All of its instances are connected by associations of type co-occurrence_of. Additionally, each association is reified by a topic of type count that counts how often these two words have been found in co-occurrence.

2. Indexing
Indexing the co-occurrences of texts is no rocket science. It is usually done by parsing the text sentence by sentence, removing all stop words, and counting the co-occurrence of each pair of the remaining words in a large SQL database table.
With the afore-mentioned topic map, the procedure would be similar, except for the last step. Instead of updating a database table, topics are created for any new pair of words. Afterwards, an association between these two topics is spawned, along with its reifying topic. Any pair of words that is found again, triggers nothing more than an incrementation of the reifiers count-occurrence.
3. Searching
This is the part, where the user comes in. To give a good example, our hero may search his or her document collection for the term aspirin (perhaps he had a hard night). Actually, he wants to find all documents that may help him getting over his headaches, not only the ones that deal with aspirin directly. Luckily for him, his search enginge does not only look in its fulltext index, but into the topic map of context, too. This works as follows:
In this way, the user will not only find the documents that contain the word aspirin, furthermore he will find documents that deal with the subject of aspirin without naming it directly.
4. Conclusions
You may have noticed that this last step is the opposite thing of word sense disambiguation. The searched terms are not narrowed to increase the precision, instead they are broadened to increase the recall. However, this change in the direction of thinking brings up an easy and rather cheap possibility to advance from keyword search to subject centric search.
This plugin provides a liquid filter for the mephisto blog engine for inclusion of calendars hosted by Google.
Installation
Installation takes three simple steps.
Usage
This plugin offers you a liquid filter. You may call this filter from every liquid template. Provide the liquid filter call with the private url of your calendars ical file. The number at the end specifies the number of items you may get. Example:
{{ 'https://www.google.com/calendar/ical/your.user.name/private-xxxxx/basic.ics' | gcal_shortlist, 7 }}
The video of Microsoft’s Bing Maps augmented reality demo at the TED conference last week shows how they integrate Flickr imagery, indoor panoramas, Worldwide Telescope and even live videos to allow a really impressive time travel for certain places.
“We see this space, three-dimensional environment as being a canvas on which all sorts of applications can play out,” said Bing Maps’ Blaise Aguera y Arcas.
This little plugin provides you with a protected area for your users on your mephisto blog. Its built to use engines compatible mephisto available in the mephisto version 8.2.
To install this plugin, run:
script/generate plugin_migration
rake db:migrate RAILS_ENV=production
This will make some changes to your production database and will expose a new Userlogin tab to your Mephisto administration interface. From there you can create new userlogins. When creating new userlogins provide a username and a password (Attention! This is designed for ease of use and easy remebering. The passwords are NOT ENCRYPTED! Don’t use this for really important stuff) along with some sites you like to provide inside of the protected area.
You need to create a liquid template for every site you list as protected. So, if you list “family_gallery” as a protected site, you need to create a template called “family_gallery.liquid”. This page will be available at http://yoursite.com/protected/family_galery. The page http://yoursite.com/protected will show a login form (if not logged in) or the conten of “protected_index.liquid” (otherwise).