It's triples all the way down
A little late due to holidays, here's the blog entry I've been hoping to write since about last November - we've reached Last Call! The group is producing a total of three Recommendation Track documents and Last Call comments on these are welcome through to 14th September.
The Grouping of Resources document is the oldest of the three and has retained much of its original character from 2007. A critical aspect of POWDER is the ability to define groups (actually now we talk about IRI sets but it's the same beast). The Description Resources document has been through the most substantial changes since its early days although the first example in the document doesn't look so different from the first public working draft version (published nearly a year ago). It's the division between operational and formal semantics, introduced this year, that is the big change between the first and this Last Call version.
The development of the two-version approach (operational and formal semantics) lead to the creation of the Formal Semantics document which underpins the other two. This is the one that defines the semantic extension required to make POWDER work in an OWL/RDF environment and confers membership of a class on an IRI if it matches one or more regular expressions.
A couple of issues are worth highlighting. Firstly the Description Resources document supports the re-instatement of the HTTP Link Header as proposed by Mark Nottingham. This also impacts on the wider debate about how @rel values should be managed. Mark Nottingham's Internet Draft makes one suggestion but there are other ideas circulating and the way forward is not 100% clear. It is largely this debate that causes us to flag the recommendation of HTTP Link as a feature at risk.
In the Formal Semantics document we note a further feature at risk, namely the ability to include arbitrary RDF in a POWDER document. There are strong arguments on both sides and the group will make a resolution based on Last Call comments received.
Alongside the Recommendation Track documents, the WG is pleased to publish drafts of its Primer and Test Suite. These will continue to evolve as the group works through Last Call and Candidate Recommendation phases.
Finally, the group has announced its latest outreach event. Called POWDER: More of What You Want, When You Want It, the event takes place at Yahoo!'s Mission College Campus in Santa Clara on 16th September.
Posted at 09:33
Apologies for the late posting of this summary
The bulk of the meeting was taken up with one issue, largely as summarised in a recent e-mail thread. The group expressed a variety of views - flexibility to meet future uses against ease of implementation is never an easy trade-off to make. In the end it was resolved that support for arbitrary RDF in POWDER documents (i.e. the XML format) would be marked as a feature at risk. If removed it would mean that:
Only relatively minor changes are necessary to effect this resolution in the latest editor's drafts. POWDER-S is unaffected.
With that discussion held, the group briefly reviewed the latest (internal) versions of its documents. With not a little relief it was resolved that, subject to the changes just described being made, the latest versions of the three Rec Track documents (Grouping, Description Resources and Formal) would be the basis of our Last Call announcement. Simultaneously, First Public Working Drafts of the Primer and test Suite would be published.
Resolution taken, the group adjourned for the summer. We will reconvene on 1st September - the Monday after the close of the Last Call period.
Posted at 09:30
Posted at 02:03
Posted at 21:47
After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
Posted at 18:16
Posted at 16:00
Even with the marginal degrees of serendipitous discovery that the current document oriented Web offers, it's still possible to stumble across poignant gems such as this statement from InspireUX :

The statement above resonates with a lot of my fundamental views about the essence of Web. It also drives right at the core of what we are trying to address with the OpenLink Data Explorer (ODE) which simply isn't about Linked Data visualization, but the combination of visualization, user interaction, and unobtrusive exposure and exploitation of Linked Data Entities culled from the existing Web of Linked Documents. ODE consumes and processes URIs or URLs. Thus, as long as the (X)HTML container / host document keeps URIs or URLs in "agent view", ODE will give you the option to interact with the-data-behind Web information resources (e.g., Web Pages, Images, Audio etc..)
Do remember, "mission-critical" is no longer a corporate / enterprise theme. The lines of demarcation between the individual and enterprise are blurring at warp speed.
Posted at 14:47
Ireland’s largest online
community boards.ie is offering a
massive amount of data for download. It contains all the data from
10 years of discussions with topics ranging from banter through
politics to philosophy, and is semantically marked up with SIOC and FOAF, which amounts to more than
9 million RDF/XML documents.
Additionally DERI is starting a competition looking for the most innovative use of these data. According to John Breslin, this could be
a novel web application that makes use of the data set, a report on analyses performed on the data, a tool that allows one to visualise or browse the semantic structure, or whatever else the imagination can come up with!
During my stay at DERI over the last couple of months, I worked on exporting and preparing this data set, so I am delighted that it is now used for this competition. It starts on the 1st of September and runs for two months. The prices for the top three submissions amount to a total of $7000.
Read about the details, sign up and download the dataset here. Damien Mulley already has a couple of ideas of what one could do with these data.
Posted at 12:14
It turns out that when you build a better mousetrap, better mice evolve.
Posted at 10:17
Posted at 09:30
Kingsley Idehen has again graciously given LinqToRdf some much needed link-love. He mentioned it in a post that was primarily concerned with the issues of mapping between the ontology, relational and object domains. His assertion is that LinqtoRdf, being an offshoot of an ORM related initiative, is reversing the natural order of mappings. He believes that in the world of ORM systems, the emphasis should be in mapping from the relational to the object domain.
I think that he has a point, but not for the reason he’s putting forward. I think that the natural direction of mapping stems from the relative richness of the domains being mapped. The impedence mismatch between the relational and object domains stems from (1) the implicitness of meaning in the relationships of relational systems and (2) the representation of relationships and (3) type mismatches.
If the object domain has great expressiveness and explicit meaning in relationships it has a ‘larger’ language than that expressible using relational databases. Relationships are still representable, but their meaning is implicit. For that reason you would have to confine your mappings to those that can be represented in the target (relational) domain. In that sense you get a priority inversion that forces the lowest common denominator language to control what gets mapped.
The same form of inversion occurs between the ontological and object domains, only this time it is the object domain that is the lowest common denominator. OWL is able to represent such things as restriction classes and multiple inheritance and sub-properties that are hard or impossible to represent in languages like C# or Java. When I heard of the RDF2RDB working group at the W3C, I suggested (to thunderous silence) that they direct their attentions to coming up with a general purpose mapping ontology that could be used for performing any kind of mapping.
I felt that it would have been extremely valuable to have a standard language for defining mappings. Just off the top of my head I can think of the following places where it would be useful:
You can see that most of these are perennial real-world problems that programmers are ALWAYS having to contend with. Having a standard language (and API?) would really help with all of these cases.
I think such an ontology would be a nice addition to OWL or RDF Schema, allowing a much richer definition of equivalence between classes (or groups or parts of classes). Right now one can define a one-to-one relationship using the owl:equivalentClass property. It’s easy to imagine that two ontology designers might approach a domain from such orthogonal directions that they find it hard to define any conceptual overlap between entities in their ontologies. A much more complex language is required to allow the reconciliation of widely divergent models.
I understand that by focusing their attentions on a single domain they increase their chances of success, but what the world needs from an organization like the W3C is the kind of abstract thinking that gave rise to RDF, not another mapping markup language!
Here’s a nice picture of how LinqToRdf interacts with Virtuoso (thanks to Kingsley’s blog).
Posted at 04:03
A Web language is not only a markup language (be XML, SGML or binary). For example, JPEG is not a Web format, but a format used on the Web. A Web format has the capability to play into the Web, it has linking capabilities.
The simple fact to be able from my Web site make a link to another Web site somewhere else on the network without having to go through a prior agreement is a major feature of the Web. It seems obvious now. It was not at the time it was designed. People created good Web sites, crap Web sites. Some sites have disappeared, some have changed their organization, such as it broke many links on the Web. This is part of the social process of using a technology. When a big enough community decides that this Web page other there contains good information, it will become a reference. There will be abuse, but like in any human communities.
The net result is that the Web became a very successful source of information.
People who are creating the Semantic Web technologies such as RDF and RDFa do not propose a perfect system. The goal is to have a system which is using the nature of the Web (links). RDFa is a proposition to make Web friendly your data in your Web pages (html). People will certainly abuse it. It will break here and there, but the net effect is the creation of a network of hyperlinked data. The value and trust about data will be created by the community usages and the social network. The technology is just the support.
Posted at 03:49
There are many challenges that have dogged attempts to mesh the DBMS & Object Technology realms for years, critical issues include:
The big deal about LINQ has been the singular focus on addressing point 1, in particular.
I've already written about the Linq2Rdf effort that meshes the best of .NET with the virtues of the "Linked Data Web".
Here is an architecture diagram that seeks to illustrate the powerful data access and manipulation options that the combination of Linq2RDF and Linked Data deliver:
What may not have been obvious to most in the past, is the fact that Mapping from Object Models to Relational Models wasn't really the solution to the problem at hand. Instead, the mapping should have been the other way around i.e., Relational to Object Model mapping. The emergence of RDF and RDBMS to RDF mapping technology is what makes this age-old headache addressable in very novel ways.
Posted at 12:36
…will be the 1st September. I sincerely apologise for the delay; due to technical difficulties (we needed a signup mechanism in place), my holidays during the first two weeks of August, and settling into the new job.
To enter, you should sign up for a user account at data.sioc-project.org; we will ring to confirm your details; then after your account is enabled, you will be able to access the data sets from the 1st September. We will also have an entry submission system available from that date (in case you make something really cool on the first day)! You can make as many submissions as you wish, but use of the data sets is restricted to the duration of the competition and during the demonstration period in November…
Posted at 10:02
Posted at 15:30
Just recently Michael Hausenblas tagged Wordle on delicious and I was fascinated by the result it produced for our blog:
As you can see:
Great work, Mr. Feinberg!
Posted at 14:26
I will here summarize what should be known about running benchmarks with Virtuoso.
For 8G RAM, in the [Parameters] stanza of
virtuoso.ini, set —
[Parameters]
...
NumberOfBuffers = 550000
For 16G RAM, double this—
[Parameters]
...
NumberOfBuffers = 1100000
For most cases, certainly all RDF cases, Read
Committed should be the default transaction isolation. In the
[Parameters] stanza of virtuoso.ini, set
—
[Parameters]
...
DefaultIsolation = 2
If ODBC, JDBC, or similarly
connected client applications are used, there must be more
ServerThreads available than there will be client
connections. In the [Parameters] stanza of
virtuoso.ini, set —
[Parameters]
...
ServerThreads = 100
With web clients (unlike ODBC, JDBC, or similar clients), it may
be justified to have fewer ServerThreads than there
are concurrent clients. The MaxKeepAlives should be
the maximum number of expected web clients. This can be more than
the ServerThreads count. In the
[HTTPServer] stanza of virtuoso.ini, set
—
[HTTPServer]
...
ServerThreads = 100
MaxKeepAlives = 1000
KeepAliveTimeout = 10
Note — The [HTTPServer] ServerThreads are
taken from the total pool made available by the [Parameters]
ServerThreads. Thus, the [Parameters]
ServerThreads should always be at least as large as (and is
best set greater than) the [HTTPServer] ServerThreads,
and if using the closed-source Commercial Version, should not
exceed the licensed thread count.
The basic rule is to use one stripe (file) per distinct physical device (not per file system), using no RAID. For example, one might stripe a database over 6 files (6 physical disks), with an initial size of 60000 pages (the files will grow as needed).
For the above described example, in the [Database]
stanza of virtuoso.ini, set —
[Database]
...
Striping = 1
MaxCheckpointRemap = 2000000
— and in the [Striping] stanza, on one line per
SegmentName, set —
[Striping]
...
Segment1 = 60000 , /virtdev/db/virt-seg1.db = q1 , /data1/db/virt-seg1-str2.db = q2 , /data2/db/virt-seg1-str3.db = q3 , /data3/db/virt-seg1-str4.db = q4 , /data4/db/virt-seg1-str5.db = q5 , /data5/db/virt-seg1-str6.db = q6
As can be seen here, each file gets a background IO thread (the
= qxxx clause). It should be noted that all
files on the same physical device should have the same
qxxx value. This is not directly relevant to
the benchmarking scenario above, because we have only one file per
device, and thus only one file per IO queue.
If queries have lots of joins but access little data, as with the Berlin SPARQL
Benchmark, the SQL compiler must be told not to look for better
plans if the best plan so far is quicker than the compilation time
expended so far. Thus, in the [Parameters] stanza of
virtuoso.ini, set —
[Parameters]
...
StopCompilerWhenXOverRunTime = 1
Posted at 14:05
Two days ago in upper Austria, the BarCamp Traunsee, subtitled “Social Media Review Camp”, took place, which I had co-organized and which was co-sponsored by our own lil’ Semantic Web Company. Andreas Blumauer (also SWC) joined me on the first day, hosting a session about and giving an introduction to Linked Data. Given the angle of the BarCamp, he gave it to an audience of Web 2.0 people (i.e. consultants, marketers, developers, communications people). And was he able to bridge the gap between 2.0 and 3.0?
Half a year ago, I had been a complete newbie to the Semantic Web and Linked Data myself, and while the concept of the Semantic Web is undoubtedly as persuasive as a technological concept possibly can be, I remember how hard it was to come to grips with it (btw, I am a Humanities/Liberal Arts person). I think that Andreas’ presentation on Friday was probably the most accessible introduction to the topic I have witnessed this far, and it allowed me to backtrack once more where the biggest comprehension and communication issues probably are.
If Semantic Web people start explaining their concepts to ‘other species’, they very soon start juggling acronyms and technical lingo, in particular names and abbreviations from the Semantic Web Stack - understandably so, as URIs, XML and RDF form the very foundation, on the technological side. But the only concept where the web 2.0 people (in particular those who approach it from the business, PR or marketing side) might still be with them is XML - even though it might sound surprising, not everyone is able to guess without context that the term URI refers to the same kind of thing as URL. And when you say RDF, people are surprisingly often inclined to think you are talking about “RFID” (Radio Frequency Identification) - it’s got, after all, also to do with unique identification, doesn’t it?
Just as the Semantic Web interfaces are only about to become more accessible to web 2.0 people (once more, hooray for Parallax), I think a VITAL next step in promoting the Semantic Web is to find human-readable explanations of its technologies.
The generic explanations all sound very good ( “At the moment, we have a web of documents, but the Semantic Web aims for the web of data” or “The Semantic Web wants computers not only to be able to process, but also to understand data”), but what they fail to achieve is to make non-tech people interested in the (workings of the) technology.
Without addressing technology, these generic explanations are just too bland to convey what is really exciting about the semantic web - yet as soon as SemWeb people start to talk technology, the acronym shower starts - see above. Dilemma.
Back to the BarCamp: I think that Andreas took
a good approach in that he
a) kept the acronym level low
b) went on to explain how Linked Data can be a better source for
mashups than APIs -
because APIs really are the Holy Grail of the Web 2.0
community. I saw it happen before and I saw it happen at the
BarCamp Traunsee - as soon as a new tool or feature is introduced,
people start asking: “Does it have an API?” - - “Will it have an
API?” - “Can I get access to the API?” - “Is the API documentation
online?”
What seems to be pegged in people’s mind is that you have to have an API to make mashups, and that mashups are what constitutes the miracle of the web 2.0. So my simple advise for all Semantic Web evangelists would be:
If you want to develop a showcase that people understand, develop a mash-up, and more specifically one that uses data that average users would use and understand.
Develop something like DBpedia mobile (call up in emulator), and go into the details of the Semantic Web stack only after people have seen and understood that you don’t need an API (well, theoretically) and huge programming effort to obtain structured, processable data.
Btw, things got even more semantic on the second day of the BarCamp: Alexander Kirk presented his Factolex dictionary, a dictionary consisting of “short and concise explanations” which can be enhanced by tags, and which, because of their simplicity, would ideally lend themselves for a conversion into triples. Alexander confirmed that he keeps semantic integration in mind while developing Factolex further.
Alexander’s presentation was followed by input from Michael Schuster (who hasn’t yet put his session online, and I seem unable to remember the names of the sites he uses and showed us). One of them was a tool that uses natural langauge processing to interpret user notes, and which is able to decide, for instance, whether an entry should be added to the calendar or to a to do list.
Nifty tool (and I hope I’ll be able to provide a link later), but what I mostly remember his presentation for is that he presented it as an example of a “dirty semantic web approach”, making it sound as something diametrically opposed to the (potentially anal) endeavours of those who rely on the Semantic Web stack.
But why open up this binary opposition? You can and must have both, semantic technologies likes NLP, and open standards such as defined in the Semantic Web stack.
It’s not like one is for the ‘cool kids’ (or web 2.0 kids) and the other one for the ‘geeks’ - if anything, then I’d say that the ‘cool kids’ are probably more interested in improving the service of just their site (making the industry and software market more diverse, if there are enough of them), whereas the ‘geeks’ work towards global exchange through the definition and further development of open standards (and make sure the ‘cool kids’ don’t get trapped in their data silos).
In the end, once the Semantic Web enters maturity level, it will need both of them.
Posted at 13:03
Posted at 12:08
Posted at 03:01
Leigh is now a colleague. How cool is that.
Posted at 00:23
Posted at 20:48
Posted at 20:18
I don't particularly care for the rel="profile" design, but one should choose ones battles and I'm not inclined to choose this one. I'm content for the market to choose.
Posted at 19:09
I don't particularly care for the rel="profile" design, but one should choose ones battles and I'm not inclined to choose this one. I'm content for the market to choose.
Posted at 18:58
The semantic web is a GOOD THING by definition - anything that enables us to create smarter software without also having to create Byzantine application software must be a step in the right direction. The problem is - many people have trouble translating the generic term “smarter” into a concrete idea of what they would have to do to achieve that palladian dream. I think a few concrete ideas might help to firm up people’s understanding of how the semantic web can help to deliver smarter products.
Software Development as knowledge based
activity
In this post I thought it might be nice to share a few ideas I had about how OWL and SWRL could help to produce smarter software development environments. If you want to use the ideas to make money, feel free to do so, just consider them as released under the creative commons attribution license. Software development is the quintessential knowledge based activity. In the process of producing a modern application a typical developer will burn through knowledge at a colossal rate. Frequently, we will not reserve headspace for a lot of the knowledge we acquire to solve a task. Frequently, we bring together the ideas, facts, standards, API skills and problem requirements needed to solve a problem then just as quickly forget it all. The unique combination is never likely to arise again.
I’m sure we could make a few comments about how it’s more important to know where the information is than to know what it is - a fact driven home to me by my Computer Science lecturer John English, who seemed to be able to remember the contents page of every copy of the Proceedings of the ACM back to the ’60s. You might also be forgiven for thinking this wasn’t true , given the current obsession with certifications. We could also comment about how some information is more lasting than others, but my point is that every project these days seems to combine a mixture of ephemera, timeless principles and those bits that lie somewhere between the two (called ‘Best Practice’ in current parlance ;).
Requires cognitive assistance
Software development, then, is a knowledge intensive activity that
brings together a variety of structured and unstructured
information to allow the developer to produce a system that they
endeavor to show is equivalent to a set of requirements,
guidelines, nuggets of wisdom and cultural mores that are defined
or mandated at the beginning of the project. Doesn’t this sound to
you like exactly the environment for which the semantic web
technology stack was designed?
Incidentally, the following applications don’t have much to do with the web, so perhaps they demonstrate that the term ‘Web 3.0′ is limiting and misleading. It’s the synergy of the complementary standards in the semantic web stack that makes it possible to deliver smarter products and to boost your viability in an increasingly competitive market place.
Documentation
OK, so the extended disclaimer/apology is now out of the way and I can start to talk about how the semantic web could offer help to improve the lives of developers. The first place I’ll look is at documentation. There are many types of documentation that are used in software development. In fact, there is a different form of documentation defined for each specific stage of the software lifecycle from conception of an idea through to its realization in code (and beyond). Each of these forms of documentation is more or less formally structured with different kinds of information related to documents and other deliverables that came before and after. This kind of documentation is frequently ambiguous, verbose and often gets written for the sake of compliance and then gets filed away and never sees the light of day again. Documentation for software projects needs to be precise, terse, rich and most of all useful.
Suggestion 1.
Use ontologies (perhaps standardised by the OMG) for the production of requirements. Automated tools could be used to convert these ontologies into human-readable reports or tools could be used to answer questions about specific requirements. A reasoner might be able to deduce conflicts or contradictions from a set of requirements. It might also be able to offer suggestions about implementations that have been shown to fulfill similar requirements in other projects. Clearly, the sky’s the limit in how useful an ontology, reasoner and rules language could be. It should also help documentation to be much more precise and less verbose. There is also scope for documentation reuse, specialization and for there to be diagramming and code generation driven off of documentation.
Documentation is used heavily inside the source code used by developers to write software too. It serves to provide an explanation for the purpose of a software component, to explain how to use it, to provide change notes, to generate API documentation web-sites, and to even store to-do list items or apologies for later reference. In .NET and Java, and now many other programming languages, it is common to use formal languages (like XML markup) to provide commonly used information. An ontology might be helpful in providing a rich and extensible language for representing code documentation. The use of URIs to represent unique entities means that the documentation can be the subject or other documents and can reach out to the wider ecology of data about the system.
Suggestion 2.
Provide an extensible ontology to allow the linkage of code documentation with the rest of the documentation produced for a software system. Since all parts of the software documentation process (being documented in RDF) will have unique URIs, it should be easy to link the documentation for a component to the requirements, specifications, plans, elaborations, discussions, blog posts and other miscellanea generated. Providing semantic web URIs to individual code elements helps to integrate the code itself into other semantic systems like change management and issue tracking systems. Use of URIs and ontologies within source code helps to provide a firm, rich linkage between source code and the documentation that gave rise to it.
Suggestion 3.
Boosted with richer, extensible markups to represent the meaning and wider documentation environment means that traditional intellisense can be augmented with browsers that provide access to all other pertinent documentation related to a piece of code. Imagine hovering over an object reference and getting links not only to a web site generated from the code commentary but to all the requirements that the code fulfills, to automated proofs demonstrating that the code matches the requirements, to blog posts written by the dev team and to MP3s taken during the brainstorming and design sessions during which this component was conceived.
It doesn’t take much imagination to see that some simple enhancements like these can provide a ramp for the continued integration of the IDE, allowing smoother cooperation between teams and their stakeholders. Making documentation more useful to all involved would probably increase the chances that people would give up Agile in favour of something less like the emperor’s clothes.
Suggestion 4.
Here’s some other suggestions about how documentation in the IDE
could be enriched.
○ Guidelines on where devs should focus their attention when
learning a new API
○ SPARQL could be exposed by code publisher
§ Could provide a means to publish documentation online
○ Automatic publishing of DOAP documents to an enterprise or online
registry, allowing software registries.
Dynamic Systems
Augmenting the source code of a system with URIs that can be referenced from anywhere opens the semantic artifacts inside an application to analysis and reference from outside. Companies like Microsoft have already described their visions for the production of documentation systems that allow architects to describe how a system hangs together. This information can be used by other systems to deploy, monitor, control and scale systems in production environments.
I think that their vision barely glimpses what could be achieved through the use of automated inference systems, rich structured machine readable design documentation, and systems that are for the first time white boxes. I think that DSI-style declarative architecture documents are a good example of what might be achieved through the use of smart documentation. There is more though.
Suggestion 5.
Reflection and other analysis tools can gather information about the structure, inter-relationships and external dependencies of a software system. Such data can be fed to an inference engine to allow it to make comparisons about the runtime behavior of a production system. Rules of inference can help it to determine what the consequences of violating a rule derived from the architect or developers documentation. Perhaps it could detect when the system is misconfigured or configured in a way that will force it to struggle under load. Perhaps it can find explanations for errors and failures. Rich documentation systems should allow developers to indicate deployment guidelines (i.e. this component is thread safe, or is location independent and scalable). Such documentation can be used to predict failure modes, to direct testing regimes and to predict optimal deployment patterns for specific load profiles.
Conclusions
I wrote this post because I know I’ll never have time to pursue these ideas, but I would dearly love to see them come to pass. Why don’t you get a copy of LinqToRdf, crack open a copy of Coco/R and see whether you can implement some of these suggestions. And if you find a way to get rich doing it, then please remember me in your will.
Posted at 11:06
I’m proud and excited to announce that from the 1st September Leigh Dodds will be joining Talis as our Platform Programme Manager. His background, experience and skill makes him an ideal candidate to develop and advance our Platform ideas. Over the years Leigh has made many contributions to the development of the Semantic Web with a particular emphasis on the Web and REST. He has written extensively on these subjects for O’Reilly media and on his blog. In fact Leigh’s writings on REST were very influential in the design of our Platform APIs. I first collaborated with him back in 2000 as part of the RSS 1.0 working group when RDF was barely a year old. Shortly after that he developed the FOAF-a-matic which has probably done more to advance adoption of FOAF and RDF than any other application. Most people’s first introduction to FOAF has been via software created by Leigh. He has also been a regular face at XTech and other conferences presenting on topics such as SPARQL and connecting social content. We’re all eagerly anticipating learning from and working with Leigh. Welcome aboard!
Posted at 10:54
Posted at 05:08
Here is a pictorial of DBpedia's Linked Data Deployment & Data Management architecture:
Key points:
SPASQL (SPARQL extension for
SQL) enables the
intelligent resource representation request handling and URI
dereferencing, that underlies "Linked Data" (i.e., Hyperdata Linking) to
occur in-process.
Posted at 02:50