• Home
  • Posts tagged 'rdf'

Posts Tagged ‘rdf’

What we’ve been working on…

threeTalis, my employer, has been a big promoter of Linked Data and open-access to information, because we see that new ideas often arise when existing ideas come together. Innovation, if you like, occurs at the join between ideas when they connect. I see this as fundamental to the way ideas and their applications (technology) advance. I tend to believe that anything “novel” is actually affected when other ideas are connected together.

In the technological world, this seems like a strong analogy for Linked Data: information which can be connected by a web-like network of links. These Linked Data have become the foundation for what has come to be known as the “Semantic Web”, a web of connected information which breaks out of information silos and enables the discovery of new ideas from old, and innovation from existing information. We use the phrase “serendipitous reuse” for the idea that once an idea (or a piece of data) is published, it can be used and reused in novel ways and in context of other data to produce unexpected, and unforeseeable possibilities. These ideas (data, again) become increasingly useful when published in a format which allows them to be linked freely to ANY other piece of information. We’ve had the distribution method for this network for years (the good, ol WWW itself) and it’s been about a year since RDF was launched by the WWW Consortium to handle the data itself. The idea is basically to give every bit of data an address (a universal address, not one subjective to a database like a cell reference), and to predicate that bit of information very much like language does. If you think of it like a language, RDF lets bits of data (nouns) to be acted upon or act upon (verbs) others (other nouns). This triple-format enables a near infinite recombination (theoretically) of any data, anywhere with an address.

So, what’s the problem? Well, most of the world’s data are locked away in silos (prisoners of the cells their databases confine them to). Many organisations may wish to make use of their data in a semantic environment, and many might even embrace the Open-source nature of their data, and make it freely available to the world to recombine and use: there are always more innovations outside an organisation than within! In order to lower barriers to enter this linked data world, Talis has built a Platform with resources to host and utilise these connections, making use of semantic web standards (RDF and SPARQL, the query language of the semantic web) and a developer-friendly environment (a RESTFul API, for example).

However, this innovation is only possible when data are accessible. In order to further lower the barriers, Talis is now offering free access to the Platform to host public domain data. We are calling this initiative the Talis Connected Commons, and the offer is not limited to free hosting: the data access services, including access to a public SPARQL endpoint, are also freely available. To keep this data open, you will need to use either the Open Data Commons Public Domain Dedication and License or the recently launched Creative Commons CC0 license to publish data. Anyone will then be able to freely access the stored data using the Platform services, without API keys and without usage limits.

There is more information available at www.talis.com/cc, where you can find detailed technical information, FAQ’s and other resources.

Image: “Eggistentialism 1.5 or Three of a Perfect Pair” by bitzcelt (via flickr), CC Licensed

 

Hook me up

my mac is cool kkkkk by yaraaa

'my mac is cool "kkkkk"' by yaraaa

I’ve been blogging a bit over on Nodalities about “stuff being connected”. The idea being basically: everyone is constantly creating data—all the bits of information that can be used in abstract.  These tiny bits of information are constantly being generated by every process we undertake, from the obvious like online banking to the more obscure like driving to work (your odometer tells you how many miles you’ve gone, your on-board computer may store info about your car’s status, your satnav knows where you’re going and been, your mobile phone may know this too, the garage knows when your last service was… this list can go on and on). These data are more powerful when automated by software, and they become exponentially more useful when they are connected with other data. For example, the knowledge that £50 pounds left your account isn’t particularly helpful without a connection to that little bit of data which tells you the date of the transaction.

But why are some data more obscure—why don’t we even think about using some of them?

It may be simply because they’re not immediately useful to us, yet. We can, right now, log in to our banks and have a look at our accounts. We can shuffle and access and compare and analyse because this information is being presented to us in an easily-managed and understandable way. We have access to the raw data, and most of us have some basic understanding of why these data are important. I wouldn’t be surprised if readers of this blog have a spreadsheet or two with financial calculations on it, or use quicken with their balance info. We all know how important calendar events, emails, address book contacts, and bank balances are, and we have various systems to deal with them.

But, what do we DO with all the data we don’t currently access routinely? Well, this is where those connections come in. We can connect data together using some sort of framework, or abstract construct like a database. However, this database will need to be connected to another database (or exported to an existing one) in order for these new bits and pieces to be considered in terms of others.

More simply, the tools and formats we use all the time (spreadsheets, calendars, notepads, computers, odometers etc…) already exist but they don’t currently take into account the further levels of data we create. We don’t have a tool to see our car’s mileage at a certain date, so we’d need to walk out to the car, look at the odometer, and guess. The bit that’s missing is the connection—the link between information we have and a tool or another bit of data. In the previous example, we need a database to collect mileage, a connection between that and date data, and a calendar to view it—tools and data.

There are two sides to these software tools, though. There’s the side presented to the user, and the side that is accessed by processors and memory and software. I’ll blog more on the human-side later, but the “stuff” happens at the edge of these two coming together.

The “Semantic Web” works on a framework which enables any data to be easily connected to other data. Instead of sitting in a traditional relational database, which makes its connections based on a set of specific instructions (schemas), all the data are encoded with a bit of information identifying them to the web. In essence, each piece of data has an address, and can be pointed to much like a web site points to another. This works at various levels of granularity, so individual records can be linked very easily, allowing for applications to be written on top of these linked data. These applications can then let us analyse, manipulate, swap, and USE anything, literally, that we can link.

Alongside this linked data infrastructure (call it the Semantic Web, or Data Web or just the Web) is the proliferation of computing hardware. Processors and memory are being manufactured into just about anything we can buy. Thiese are all working  to take the stuff we do and “translate” it into data. Phones, cars, fridges, credit cards, clocks, scales, watches… we’re surrounded by little processors or bits of memory recording and crunching what we do. What makes this situation currently frustrating/exciting is that they currently don’t share their information, and aren’t “aware” of the potential of other computing.

So, what am I getting at? Well, like we’re saying over on Nodalities, hook it up! We’re getting data, that’s happening. We have the framework(s) and the distributed network (the Web), and we have decades of experience automating data-comparisons (which is all Software ever does, if you boil it down).

The next step is to connect it.

Enhanced by Zemanta

 

Zemanta

Exterior view. Bronze tympanum, by Olin L. Warner, representing Writing above main entrance doors. Library of Congress Thomas Jefferson Building, Washington, D.C. Cropped from the Library of Congress digital version using the GIMP.

Image via Wikipedia

I’m trying out a Zemanta blog post. What it does, apparently, is to suggest ideas for the article you’re currently writing. It’s a semantic blog suggestion feature, and it’s manifested in this instance as a firefox plugin that adds a write widget to my WordPress WYSIWYG editor. IIt updates every 300 characters, and also has ‘semantic features’. There’s an interview over at R/WW, for more information. I’m kind of trying to see what it recommends so need to fill in the 300 characters:

Well it looks like it suggests related articles, and adds a bunch of Zemanta boxes into the blog space. It also finds images from Flickr.

I could see this tool being very handy in future, though I usually blog from a client, and I don’t think this supports ScribeFire or ecto (which is rubbish, by the way.) However, there are a few problems with it:

  1. It generates an unhelpful set of areas in the blog itself. So if you include a Zemanta suggestion, it pastes it where you’re typing, and you end up typing in an alt area in the code… annoying.
  2. It updates every 300 characters. This is annoying because it’s not necessarily that real-time. This is an awkward interface feature. It also places your curser at the top of the post every time it updates, meaning what I just typed appeared above the opening line…

I think this kind of application, however, is prescient of the direction the Read/Write web is heading. It’s active and dynamic, and I’m sure the interface will be ironed out over time. I’m not sure what ‘semantic’ features they’re necessarily incorporating (is this just keyword-searched or is it tyingin with some RDF store somewhere?) but I like the way it’s heading.

I like the fact that it suggests images (all images in this post provided by Zemanta), but I’m not sure about the inclusion of ‘Zemanta’ presence everywhere… I’m also slightly concerned that some of the images it supplies are ‘license unknown’, meaning  you could use one and infringe on copyright. It does, however, have a link saying you can check it yourself, which shows they’re thinking ahead! It’s implementation of images is a bit of a struggle, however, in that you end up typing in the description area without the ability to click out of it. This is balanced by the fact that it automatically adds citations. It only adds a single image, though… so you can’t add a second image to the same blog post.

Now, they just need to make it a bit smoother, and stop jumping to the top of the bloody post ;)

Zemanta Pixie

 
© 2010 Zach Beauvais
some rights reserved