Sitecore with React and SEO

On a recent Sitecore build-out, the architecture included, among other things, a significant amount of functionality that was provided by a client-side accessed API.  We chose React to connect to and provide the UI for that API.  Since we were already using React for some things, we chose to standardize on React and use it for all of the UI.

The prevailing approach to be found around the web was to write the entire page as a single React component.  The rare article or guide that spoke of using smaller standalone components all suggested supplying the components with their data in the form of JSON.

Let’s take a look at the example of a simple, standalone component on React’s homepage:

On the last line (line 7) above, we see that the property value, subject="World", is being supplied to the component. But let’s assume that "World" is text that is managed by the CMS.  How would we get that text from the CMS to React? Popular thinking suggests outputting all the data you would need as JSON–and passed directly to React.render, inside of a script tag, much as you see on line 22 below.

The problem is, that approach doesn’t provide very good SEO value.

So how can we provide the data to our components in an SEO-friendly way? Interestingly, use one of the technologies introduced to assist search engines to improve the semantic value of the data they crawl–schema.org.

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.

Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model.

Schema.org markup includes the concepts of itemscope, and itemprop.  We adapted this approach to provide a markup structure and attributes that would readily convert to the JSON-full-of-component-props that we need to render React components.

A simplified version of the final result:

As you can see, data-react="%ReactComponentName%" attribute-value pair identifies a markup structure as a React component.  Inside this structure, data-prop="%propertyName%" attribute-value signifies that the contents of a given html tag represent the value of that property. For instance, data-react="HelloMessage" identifies a markup structure as representing the place in the page where a HelloMessage component should render. And the markup structure also contains data-prop’s to provide the props data for the component. The first HelloMessage has data-prop="greeting" with text contents of "Hello". This is converted to { greeting: "Hello" } before being passed in when rendering the component.

Consider the following markup:

The above markup gets converted to the JavaScript object below:

As an added convenience, the React components are not rendered into some other DOM Node, as in the previous examples. Instead, quite naturally, they are rendered in-place right where they are defined in the markup, using the props defined inside the same markup.

And there we have it, Sitecore with React and SEO.

written by: Dennis Hall (Presentation Layer Architect, Technology)

What the Rise of Cloud Computing Means for Infrastructure

Infrastructure setup and application programming are merging into simultaneous processes. With this critical change, we need to take a fresh look at how we design solutions. If we don’t, our projects risk failure.

Building and installing system infrastructure (think servers and networks) was once an arduous process. Everything had to be planned out and procured, often at high costs and with a long lead time. Often times, server specifications were created before the actual application (and the technologies involved) that would need to run on it had been fully flushed out. The actual application programming task was a whole separate step with little overlap.

That’s no longer the case due the rise of Cloud computing. Infrastructure is now software, and the convenience of that leads to new challenges.

Merging Designs

With Cloud computing, Infrastructure is way more fluid thanks to all the programmable elements. As a result, upfront planning isn’t as important, as cost and especially timelines are not a constraint anymore. Compute, storage and network capacity is immediately accessible and can be changed dynamically to suit any need.

With these changes, the days of separate tracks for application and infrastructure development are over. The once separate design processes for each of them need to merge as well. This is largely driven by 3 factors:

  1. Historically, the separation of application and infrastructure development didn’t work, but it was accepted as a given.
  2. Cloud architectures take on a bigger role than traditional infrastructure
  3. New Architectures create new demands

The Historical Challenge

Performance, availability and scalability have been a challenge forever. Before cloud architectures became standard, vendors have been trying to address these requirements with complex caching architectures, and similar mechanisms. The reality is that none of the products really delivered on this premise out of the box. Obviously, one core challenges was that companies were trying to deliver dynamic experiences on a fixed infrastructure.

But even within that fixed infrastructure, any deployment required exhaustive performance tuning cycles and vendor support, trying to overcome the issue of infrastructure independently designed from the application, with only moderate success.

The Changing Infrastructure Role

Cloud architectures also start to play a bigger role in the overall systems stack. Let’s look at a hypothetical basic Java application with an API build on Amazon Web Services, the most popular cloud computing service, to see what the merger of system infrastructure and application programming looks like.

The application can be developed like any other Java application, but once it comes to how security is addressed, what is specified where?

On the application side, there could be some internal security mechanisms that define what access to services is available. Internal application roles can determine what access to different data elements the service request has. From an infrastructure perspective, Amazon Web Services can also provide security measures (access to ports, another layer of permissions, etc.) that affect how the application API can be accessed by clients. In addition, Amazon’s AWS policies can define which request arrives at the application, or which data elements are available once a request is being serviced.

As this example shows, the application and infrastructure views need to be merged in order to fully understand the security mechanisms available. Just focusing on one side or the other paints an unclear picture.

New Architectures

A number of new architectures have been created now that infrastructure is programmable. Reactive architectures and code executors like Google Cloud Functions and AWS Lambda are examples of these serverless computing services. Once we start using fully dynamic infrastructures for auto-scaling and micro services, the need for in integrated view of both the application and systems becomes even more important.

Finding New Solutions

Handling infrastructure and application development in an integrated manner is difficult.

One of the challenge is that the design tools to visualize this are lacking. Tools like Cloudcraft help in this regard but a fully integrated view is lacking, especially if you start using new architectures like AWS Lambda. Ideally, there’d be a way to visually layer the different perspectives of an architecture in a way that resembles a Photoshop image. Easily looking at an architecture from the perspective of security, services, data flows, and so on would be incredibly useful.

From a process perspective, infrastructure and application have to be handled with the same processes. This includes code management, defect tracking and deployment. This of course has implications on the skills and technology needed to successfully complete a project, and not all organizations are ready for this yet.

Conclusion

These days, infrastructure and application are intertwined, and an application solution that doesn’t address the infrastructure element is incomplete. Focusing on one without the other cannot address the critical requirements around security, performance, scalability, availability and others. It is important to invest in the tools, processes and people to deliver on this.

written by: Martin Jacobs (GVP, Technology)

A reactive serverless cms for the technology blog

Background

At Razorfish, we are big proponents of reactive architectures. Additionally, we believe architectures using cloud functions such as AWS Lambda are part of the future of application development. Our relaunch of the blog was a good occasion to test this out.

Historically, the blog had been hosted on WordPress. Although WordPress is a good solution, we had run into some performance issues. Although there are many good ways to address performance challenges in WordPress, it was a good time to explore a new architecture for the blog, as we weren’t utilizing any WordPress specific features.

We had used static site generators for a while for other initiatives, and looked at these types of solutions to create the new site. We wanted to avoid any running servers, either locally or in the cloud.

Our technology stack ended up as follows:

  • Github – Contains two repositories, a content repository with Hugo based themes, layout and content, and a code repository with all the cms code.

  • AWS Lambda

  • Hugo – Site Generator written in the Go Programming Language

  • AWS S3 – Source and generated sites are stored on S3

  • AWS CloudFront – CDN for delivery of site.

Why Hugo?

There are a large number of site generators available, ranging from Jekyll to Middleman. We explored many of them, and decided on Hugo for a couple of reasons:

  • Speed – Hugo generation happens in seconds

  • Simplicity - Hugo is very easy to install and run. It is a single executable, without any dependencies

  • Structure - Hugo has a flexible structure, allowing you to go beyond blogs.

Architecture

The architecture is outlined below. A number of Lambda functions are responsible for executing the different functions of the CMS. Some of the use of Hugo was derived from http://bezdelev.com/post/hugo-aws-lambda-static-website/. The authentication function was loosely derived from https://github.com/danilop/LambdAuth.

The solution uses AWS lambda capabilities to run executables. This is used for invoking Hugo, but also for incorporating libgit2, which allows us to execute git commands and integrate with Github.

CMS

As part of the solution, a CMS UI was developed to manage content. It allows the author to create new content, upload new assets, and make other changes.

Content is expected to be in Markdown format, but this is simplified for authors with the help of the hallojs editor.

Preview is supported with different breakpoints, including a mobile view.

As it was developed as a reactive architecture, other ways to update content are available:

  • Through a commit on github, potentially using github’s markdown editor.

  • Upload or edit markdown files directly on S3

Challenges

As the solution was architected, a few interesting challenges had to be addressed.

  1. At development, only Node 0.14 was supported on AWS. To utilize solutions like libgit2, a more recent version of Node was needed. To do so, a Node executable was packaged as part of the deploy, and Node 0.14 spawned the more recent Node version.

  2. Only the actual site should be accessible. To prevent preview and other environments from being accessible, CloudFront signed cookies provided a mechanism to prevent the other environments from being directly accessible.

  3. Hugo and libgit are libraries that need to be compiled for the AWS Lambda linux environment, which can be a challenge with all other development occurring on Windows or Macs.

Architecture benefits

The reactive architecture approach makes it really easy to enhance and extend the solution with other options of integrating content or experience features.

For example, as an alternative to the described content editing solutions above, other options can be identified:

  • A headless CMS like Contentful could be added for a richer authoring UI experience.

  • By using SES and Lambda to receive and process the email, an email content creation flow could be setup.

  • A convertor like pandoc on AWS Lambda can be incorporated into the authoring flow, for example for converting source documents to the target markdown format. It possibly can be invoked from the CMS UI, or from the email processor.

From an end-user experience perspective, Disqus or other 3rd party providers are obvious examples to incorporate comments. However, the lambda architecture can also be an option to easily add commenting functionality.

Conclusion

Although more and more tools are coming available, AWS Lambda development and code management can still be a challenge, especially in our scenario with OS specific executables. However, from an architecture perspective, the solution is working very well. It has become very predictive and stable, and allows for a fully hands-off approach on management.

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Content Management

In previous posts, I explored the opportunities for machine learning in digital asset management, and, as a proof-of-concept, integrated a DAM solution (Adobe AEM DAM) with a set of machine learning APIs.

But the scope of machine learning extends much further. Machine learning can also have a profoundly positive impact on content management.

In this context, machine learning is usually associated with content delivery. The technology can be used to deliver personalized or targeted content to website visitors and other content consumers. Although this is important, I believe there is another opportunity that stems from incorporating machine learning into the content creation process.

Questions Machine Learning Can Answer

During the content creation process, content can be analyzed by machine learning algorithms to help address some key questions:

  • How does content come across to readers? Which tones resonate the most? What writer is successful with which tone? Tone analysis can help answer that.
  • What topics are covered most frequently? How much duplication exists? Text clustering can help you analyze your overall content repository.
  • What is the article about? Summarization can extract relevant points and topics from a piece of text, potentially helping you create headlines.
  • What are the key topics covered in this article? You can use automatic topic extraction and text classification to create metadata for the article to make it more linkable and findable (both within the content management tool internally and via search engines externally).

Now that we know how machine learning can transform the content creation process, let’s take a look at a specific example of the technology in action.

An Implementation with AEM 6.2 Content Fragments and IBM Watson

Adobe just released a new version of Experience Manager, which I discussed in a previous post. One of the more important features of AEM 6.2 is the concept of “Content Fragments.”

In the past, content was often tied to pages. But Content Fragments allow you to create channel-neutral content with (possibly channel-specific) variations. You can then use these fragments when creating pages.

Content Fragments are treated as assets, which makes them great candidates for applying analysis and classification. Using machine learning, we’re able to analyze the tone of each piece of content. Tones can then be associated with specific pieces of content.

In the implementation, we used IBM Bluemix APIs to perform tone analysis. The Tone Analyzer computes emotional tone (joy, fear, sadness, disgust or anger), social tone (openness, conscientiousness, extraversion, agreeableness or emotional range) and language tone (analytical, confident or tentative).

The Tone Analyzer also provides insight on how the content is coming across to readers. For each sub-tone, it provides a score between 0 and 1. In our implementation, we associated a sub-tone with the metadata only if the score was 0.75 or higher.

The Results of Our Implementation

If you want to take a look, you’ll find the source code and setup instructions for the integration of Content Fragments with IBM Watson Bluemix over on GitHub.

We ran our implementation against the text of Steve Jobs’ 2005 Stanford commencement speech, and the results are shown below. For every sub-tone with a score of 0.75 or higher, a metadata tag was added.

Results

The Takeaway from Our Implementation

Machine learning provides a lot of opportunities during the content creation process. But it also requires authoring and editing tools that seamlessly integrate with these capabilities to really drive adoption and make the use of those machine-learning insights a common practice.

Personally, I can’t wait to analyze my own posts from the last few months using this technology. That analysis, combined with LinkedIn analytics, will enable me to explore how I can improve my own writing and make it more effective.

Isn’t that what every content creator wants?

written by: Martin Jacobs (GVP, Technology)