• Home
  • Posts tagged 'Semantic Web'

Posts Tagged ‘Semantic Web’

Opening Up: A quick note from Matt McAlister

Matt McAlister, over at the Guardian, wrote a fantastic piece earlier today about the way the Semantic Web has panned out; and you should definitely read it (http://www.mattmcalister.com/blog/2009/11/25/508/socially-linked-data/). I just wanted to snip out a simple quote from his post:

“Openness makes you more relevant. It creates opportunity. It’s a way into people’s hearts and minds. It’s empowering. It’s not hard to do. And once it starts happening it becomes apparent that it mustn’t and often can’t stop happening.”

What do you think?

Posted via email from beauvais’s posterous

 

Journalism Needs Data in 21st Century

|This first appeared as a guest post on ReadWriteWeb, republished with kind permission

Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on “data” to feed our stories, to the point that “data-driven reporting” becomes second nature to journalists.

The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer.

With this mindset, finding mainstream data-driven stories doesn’t take long at all. A quick scan of the Guardian’s home page tells us that swine flu cases are up by 50%, according to “fresh figures…[that] will be released this afternoon.” The story here is that we’re in danger because swine flu is on the rise. Reporting the current figures available for swine flu alone wouldn’t be all that interesting. The news comes from comparing the current figures to last week’s, which is a very simple form of data analysis. By making use of published data and running one’s own analysis (and building on the analysis of others), we get something very news-worthy indeed. It moves the definition ever so slightly, from “saying and asserting” to “analyzing and publishing.” But it obviously works only for data that is accessible.

There is nothing new about pointing out the importance of public data being made available. Sir Tim Berners-Lee has discussed at length the importance of governments and institutions putting their data online, making it accessible and useful. His TED talk and interviews with ReadWriteWeb and Talis (disclosure: I am a blogger at Talis) all explain his belief that by publishing linked data we can begin to solve many of the problems the world faces. Innovations in medicine, science, and development could all be achieved if only currently hidden data were made available. Data-driven journalism could be the first step in realizing this dream. The best stories would then come from innovators who read about trends reported in news media and are then able to draw new conclusions and solve bigger problems. In his recent discussion with BBC, Berners-Lee said that the next step is to go for low-hanging fruit by just getting the data out there.

Thus far, this has made a lot of sense to me, and I have been tracking the publication of linked data and increasing access to public knowledge as emerging trends over at Talis. But my perspective has shifted a bit in the past few weeks.

First, there was data.gov and President Obama’s call for more access to government data. A sitting head of state (and one of some significance) was clearly calling for public access to government data: this was news! But the idea has been discussed, praised, and debated for a while since then and may have lost some of its luster.

Then about a month ago, UK Prime Minister Gordon Brown made it part of his digital strategy to prioritize the publication of government information. He asked Sir Tim personally “to help us drive the opening up of access to Government data in the web over the coming months” and appointed Berners-Lee an official governmental adviser. By now, neither of these stories is news and comparisons between the initiatives have been made.

The Guardian newspaper recently launched its own Data Blog, with the intention of letting readers access, mash up, and reuse much of its information in the form of data, which could in turn drive stories.

What is perhaps not as explicitly recognized is the voracious appetite for data that has been apparent for months. It is less about turning good ideas into stories and more about seeing how data informs our understanding of events happening right now. Each new initiative is another piece of low-hanging fruit picked.

Access to data is important: it drives innovation and even social change. Governments that publish their data have to become more transparent. Humanitarian organizations that make their findings known could spark bigger projects and source innovative solutions from their communities. Scientific findings and raw information could be used to solve bigger problems than the result of a single experiment or trial could ever manage. Even the simple comparison of two or more facts can lead to new insight, and all of these things happen only when the walls around an institution become porous.

2009 could become known as the year of data, the year of open access, or the year of the semantic Web (see links above for how this relates), and it may also be the first year when it becomes news that data wasn’t published in a story when it should have been. That a government body isn’t being transparent or is blocking access by publishing its findings in PDF or other non-linking formats would make a very interesting story indeed. We can expect to see more and more organizations and public bodies remove their own barriers through initiatives and legislation. Examples have been set, and seeing excuses die along with barriers is not far-fetched.

Do you know of other data-driven stories? We’d love to hear about any insights that were made through publicly accessible data or where this data might come from next.

Guest author: Zach Beauvais is a Platform Evangelist for Talis and editor of Nodalities Magazine.

Enhanced by Zemanta

 

What we’ve been working on…

threeTalis, my employer, has been a big promoter of Linked Data and open-access to information, because we see that new ideas often arise when existing ideas come together. Innovation, if you like, occurs at the join between ideas when they connect. I see this as fundamental to the way ideas and their applications (technology) advance. I tend to believe that anything “novel” is actually affected when other ideas are connected together.

In the technological world, this seems like a strong analogy for Linked Data: information which can be connected by a web-like network of links. These Linked Data have become the foundation for what has come to be known as the “Semantic Web”, a web of connected information which breaks out of information silos and enables the discovery of new ideas from old, and innovation from existing information. We use the phrase “serendipitous reuse” for the idea that once an idea (or a piece of data) is published, it can be used and reused in novel ways and in context of other data to produce unexpected, and unforeseeable possibilities. These ideas (data, again) become increasingly useful when published in a format which allows them to be linked freely to ANY other piece of information. We’ve had the distribution method for this network for years (the good, ol WWW itself) and it’s been about a year since RDF was launched by the WWW Consortium to handle the data itself. The idea is basically to give every bit of data an address (a universal address, not one subjective to a database like a cell reference), and to predicate that bit of information very much like language does. If you think of it like a language, RDF lets bits of data (nouns) to be acted upon or act upon (verbs) others (other nouns). This triple-format enables a near infinite recombination (theoretically) of any data, anywhere with an address.

So, what’s the problem? Well, most of the world’s data are locked away in silos (prisoners of the cells their databases confine them to). Many organisations may wish to make use of their data in a semantic environment, and many might even embrace the Open-source nature of their data, and make it freely available to the world to recombine and use: there are always more innovations outside an organisation than within! In order to lower barriers to enter this linked data world, Talis has built a Platform with resources to host and utilise these connections, making use of semantic web standards (RDF and SPARQL, the query language of the semantic web) and a developer-friendly environment (a RESTFul API, for example).

However, this innovation is only possible when data are accessible. In order to further lower the barriers, Talis is now offering free access to the Platform to host public domain data. We are calling this initiative the Talis Connected Commons, and the offer is not limited to free hosting: the data access services, including access to a public SPARQL endpoint, are also freely available. To keep this data open, you will need to use either the Open Data Commons Public Domain Dedication and License or the recently launched Creative Commons CC0 license to publish data. Anyone will then be able to freely access the stored data using the Platform services, without API keys and without usage limits.

There is more information available at www.talis.com/cc, where you can find detailed technical information, FAQ’s and other resources.

Image: “Eggistentialism 1.5 or Three of a Perfect Pair” by bitzcelt (via flickr), CC Licensed

 

Hook me up

my mac is cool kkkkk by yaraaa

'my mac is cool "kkkkk"' by yaraaa

I’ve been blogging a bit over on Nodalities about “stuff being connected”. The idea being basically: everyone is constantly creating data—all the bits of information that can be used in abstract.  These tiny bits of information are constantly being generated by every process we undertake, from the obvious like online banking to the more obscure like driving to work (your odometer tells you how many miles you’ve gone, your on-board computer may store info about your car’s status, your satnav knows where you’re going and been, your mobile phone may know this too, the garage knows when your last service was… this list can go on and on). These data are more powerful when automated by software, and they become exponentially more useful when they are connected with other data. For example, the knowledge that £50 pounds left your account isn’t particularly helpful without a connection to that little bit of data which tells you the date of the transaction.

But why are some data more obscure—why don’t we even think about using some of them?

It may be simply because they’re not immediately useful to us, yet. We can, right now, log in to our banks and have a look at our accounts. We can shuffle and access and compare and analyse because this information is being presented to us in an easily-managed and understandable way. We have access to the raw data, and most of us have some basic understanding of why these data are important. I wouldn’t be surprised if readers of this blog have a spreadsheet or two with financial calculations on it, or use quicken with their balance info. We all know how important calendar events, emails, address book contacts, and bank balances are, and we have various systems to deal with them.

But, what do we DO with all the data we don’t currently access routinely? Well, this is where those connections come in. We can connect data together using some sort of framework, or abstract construct like a database. However, this database will need to be connected to another database (or exported to an existing one) in order for these new bits and pieces to be considered in terms of others.

More simply, the tools and formats we use all the time (spreadsheets, calendars, notepads, computers, odometers etc…) already exist but they don’t currently take into account the further levels of data we create. We don’t have a tool to see our car’s mileage at a certain date, so we’d need to walk out to the car, look at the odometer, and guess. The bit that’s missing is the connection—the link between information we have and a tool or another bit of data. In the previous example, we need a database to collect mileage, a connection between that and date data, and a calendar to view it—tools and data.

There are two sides to these software tools, though. There’s the side presented to the user, and the side that is accessed by processors and memory and software. I’ll blog more on the human-side later, but the “stuff” happens at the edge of these two coming together.

The “Semantic Web” works on a framework which enables any data to be easily connected to other data. Instead of sitting in a traditional relational database, which makes its connections based on a set of specific instructions (schemas), all the data are encoded with a bit of information identifying them to the web. In essence, each piece of data has an address, and can be pointed to much like a web site points to another. This works at various levels of granularity, so individual records can be linked very easily, allowing for applications to be written on top of these linked data. These applications can then let us analyse, manipulate, swap, and USE anything, literally, that we can link.

Alongside this linked data infrastructure (call it the Semantic Web, or Data Web or just the Web) is the proliferation of computing hardware. Processors and memory are being manufactured into just about anything we can buy. Thiese are all working  to take the stuff we do and “translate” it into data. Phones, cars, fridges, credit cards, clocks, scales, watches… we’re surrounded by little processors or bits of memory recording and crunching what we do. What makes this situation currently frustrating/exciting is that they currently don’t share their information, and aren’t “aware” of the potential of other computing.

So, what am I getting at? Well, like we’re saying over on Nodalities, hook it up! We’re getting data, that’s happening. We have the framework(s) and the distributed network (the Web), and we have decades of experience automating data-comparisons (which is all Software ever does, if you boil it down).

The next step is to connect it.

Enhanced by Zemanta

 

Future of Web Apps

I’m planning to attend this year’s Future of Web Apps conference in London. Their list of speakers sounds fantastic, and I’m really looking forward to meeting some folks in real life.

I’m particularly interested in this conference for its stated focus on the web community. Just have a look at the Agenda:

  • How to grow and nurture your community
  • Work/life balance or Blood, sweat and tears: Which is the startup way?
  • Colliding Worlds: Using Jabber to make awesome web sites
  • Startups live – An interview with three new European startups
  • How to survive outside of Silicon Valley
Sounds good, doesn’t it?
There are also “Networking Opportunities” there. These sound brilliant despite the rather corporatese description.
They’ve apparently got seats left, and if you book before 4th August, you save £100.
If you’re going, let me know—we can meet up. I can tell you a bit about myself and Talis.

 

Opening up Education



I just watched this talk by Richard Baraniuk (link if embed doesn’t work), about opening up access to educational text and information. One of the most amazing ideas from his talk and the project they’re working on (Connections) is open-sourced text books.

The idea is that collaborative text-books, published on-demand could answer more questions and provide a better, more tailored resource to students and teachers. If educational resources are produced with granularity (i.e. like ‘Lego’ blocks which can be reused at a small level) they can be used and reused in a variety of novel and unpredicted ways.
Have a look and let me know what you think!

 
© 2010 Zach Beauvais
some rights reserved