Ready for Web 3.0/Semantic Web?

September 30th, 2008     by Raymond Velez    

When mainstream media starts talking about SemanticWeb, one can infer that it is not just another buzz within research labs.  Recently the magazine The Economist, and BBC online covered this topic.  Early this month Thomson-Reuters announced a service that will help in Semantic Markup. 

SemanticWeb Primer

The term Semantic Web was first used by Sir Tim Berners-Lee, the inventor of World Wide Web, to be “… day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines”.    The most significant aspect of semantic web is the ability of machines to understand and derive semantic meaning from the web content.   The term Web 3.0, was introduced in 2006 as a next generation web with emphasis on semantic web technologies.  Though the exact meaning and functionality in Web 3.0 is vague, most experts agree that we can expect Web 3.0 in some form starting in year 2010.

There are two approaches to extract semantic knowledge from web content.  The first involves extensive natural language processing of content, while the second approach places the burden on content publishers to annotate or markup content.  This marked-up content can be processed by search engines, browsers or intelligent agents.  This solution overcomes the shortcomings of natural language processing which tends to be non-deterministic; furthermore determining the meaning depends not only on the written text, but also on information that is not captured in written text.  For instance, an identical statement by Jay Leno or from Secretary Hank Paulson may have a totally different meaning.    

The ultimate goal of web 3.0 to provide intelligent agents that can understand web content , is still a few years away.  Meanwhile, we can start capturing information and start building constructs in our web pages to facilitate search engines and browsers to extract context and data from content.  There are multiple ways of doing semantic markup of web content that is understood by browsers and search engines.    

Semantic Search Engines

On Sept 22, 2008 Yahoo announced that it will be extracting rdfa data from web pages.   This is a major step in improving the quality of search results.  Powerset (recently acquired by Microsoft) is initially allowing semantic searches on content from wikipedia.org, which is a fairly structured content.  While Hakia uses a different approach, it processes unstructured web content to gather semantic knowledge.  This approach is language based and dependent on grammar.

Semantic markup s- RDFa, and microformats

W3C consortium has authored specifications for annotation using RDF an XML based standard, that formalizes all relationships between entities using triples.  A triple is a notation involving a subject, object and a predicate, for example “Paris is the capital of France” the subject being Paris, the predicate is capital, while ‘France’ is the object.  RDFa is an extension to XHTML to support semantic markup that allows RDF triples to be extracted from web content.

Microformats are simpler markups using XHTML and HTML tags which can be easily embedded in web content.  Many popular sites have already started using microformats.  Flickr uses geo for tagging photo locations, hCard and XFN for user profile.  LinkedIn  uses hcard, hResume and XFN on user contacts.

Microformat hCard example in html  and resulting output on browser page.

<div id=”hcard-Atul-Kedar” class=”vcard”>

 <span class=”fn”>Atul Kedar</span>

        <span class=”given-name”>Atul Kedar</span>

   </span>

   <div class=”org”>Avenue a Avenue A | Razorfish</div>

   <div class=”adr”>

       <div class=”street-address”>1440 Broadway</div>

       <span class=”locality”>New York</span>,,

       <span class=”region”>NY</span>

      <span class=”country-name”>USA</span>

   </div>

</div>

Atul Kedar
Avenue A | Razorfish
1440 Broadway
New York, NY USA

Microformat hCalendar entry example with browser view:

<div id=”hcalendar-Web-3.0″ class=”vevent”>

       <a href=”http://www.web3event.com/conference.php” class=”url”>

           <abbr title=”2008-10-16″ class=”dtstart”>October 16th</abbr> :

           <abbr title=”2008-09-18″ class=”dtend”>September 18th, 2008</abbr>

         <span class=”summary”>Web 3.0</span> at

         <span class=”location”>Sunnyvale, CA</span> </a>

        <div class=”tags”>Tags:

      <a href=”http://eventful.com/events/tags/web%203.0″ rel=”tag”> web 3.0</a>

      <a href=”http://eventful.com/events/tags/SemanticWeb” rel=”tag”> SemanticWeb</a></div>

</div>

 As you notice from the above examples microformats can be added to existing content and are interpreted correctly by the browsers.  There are many more entities that can be semantically tagged such as places, people and organizations.   Some web browser enhancements (Firefox) recognize these microformats and allow you to directly add them to your calendar or contacts by a single click.  

Automated Semantic markup services and tools

Another interesting development is in the area of automatic entity extraction from content, these annotation application or web services are being developed.  Thomson Reuters is now offering a professional service OpenCalais to annotate content. PowerSet is working on towards similar offerings.   These service reduces the need for content authors to painfully go thru the content and manually tag all relationships. Unfortunately, these services are not perfect and need manual crosschecking and edits.  Other similar annotation services or tools are Zementa, SemanticHacker and  Textwise.

Next Steps

As Web 3.0 starts to take shape, it will initially affect the front end designers involved with the web presentation layer, as organizations demand more semantic markup within the content.  In due course , CMS architects will have to update design of data entry forms, design of entity information records in a manner that facilitates semantic markup and removes any duplication of entity data or entity relationships.  Entity data such as author information, people information, addresses, event details, location data, and media licensing details are perfect candidates for new granular storage schemes and data entry forms.

 

 

Bookmark and Share

Del.icio.us Del.icio.us     Digg Digg     Technorati Technorati     Furl Furl     reddit reddit

  1. 4 Responses to “Ready for Web 3.0/Semantic Web?”

  2. By Merle Tenney on Oct 1, 2008 | Reply

    Great intro to the Semantic Web. It is a shame, though, that we do not know the name of the author. Even if it is a team blog, I believe that the people who write individual posts deserve a byline.

    Merle

  3. By Atul Kedar on Oct 2, 2008 | Reply

    Hi Merle,

    Thanks for noticing. Just noticed a great article on the missing semantic web sites - http://www.readwriteweb.com/archives/rdf_semantic_web_apps.php#112632 . We need to spread the word to increase the adoption web 3.0 without all the geeky jargon of triple stores and OWL.

    rgd
    Atul Kedar

  4. By billy on Oct 20, 2008 | Reply

    I believe the author is depicted in the business card example above:
    Atul Kedar
    Avenue A | Razorfish
    1440 Broadway
    New York, NY USA

  5. By theneemies on Nov 26, 2008 | Reply

    At a recent Bar Camp Sydney I saw a fantastic application of microformatted addresses - it really brought home the practical application of microformats - Mapanui (http://www.mapanui.com) uses a bookmarklet to integrate maps with any annotated address.

Post a Comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 4 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a