Machine Learning Archive

How Machine Learning Can Transform Content Management- Part II

In a previous post, I highlighted how machine learning can be applied to content management. In that article, I described how content analysis could identify social, emotional and language tone of an article. This informs a content author further on his writing style, and helps the author understand which style resonates most with his readers.

At the last AWS re:Invent conference, Amazon announced a new set of Artificial Intelligence Services, including natural language understanding (NLU), automatic speech recognition (ASR), visual search, image recognition and text-to-speech (TTS).

In previous posts, I discussed how the Amazon visual search and image recognition services could benefit digital asset management. This post will highlight the use of some of the other services in a content management scenario.

Text to Speech

From a content perspective, text to speech is a potential interesting addition to a content management system. One reason could be to assist visitors with vision problems. Screen readers can already perform many of these tasks, but leveraging the text to speech service provides publishers more control over the quality and tone of the speech.

Another use case is around story telling. One of the current trends is to create rich immersive stories, with extensive photography and visuals. This article from the New York Times describing the climb of El Capitan in Yosemite is a good example. With the new text to speech functionality, the article can easily be transformed into a self playing presentation with voice without any additional manual efforts, such as recruiting voice talent and the recording process.

Amazon Polly

Amazon Polly (the Amazon AI text to speech service) turns text into lifelike speech. It uses advanced deep learning technologies to synthesize speech that sounds like a human voice. Polly includes 47 lifelike voices that spread across 24 languages. Polly is service that can be used in real-time scenarios. It can used to retrieve a standard audio file (such as MP3) that can be stored and used at a later point. The lack of restrictions on storage and reuse of voice output makes it a great option for use in a content management system.

Polly and Adobe AEM Content Fragments

In our proof of concept, the Amazon AI text to speech service was applied to Content Fragments in Adobe AEM.

Content Fragment

As shown above, Content Fragments allow you to create channel-neutral content with (possibly channel-specific) variations. You can then use these fragments when creating pages. When creating a page, the content fragment can be broken up into separate paragraphs, and additional assets can be added at each paragraph break.

After creating a Content Fragment, the solution passes the content to Amazon Polly for retrieving the audio fragments with spoken content. It will generate a few files, one for the complete content fragment (main.mp3), and a set of files broken up by paragraph (main_1.mp3, main_2.mp3, main_3.mp3). The audio files are stored as associated content for the master Content Fragment.

Results

When authoring a page that uses this Content Fragment, the audio fragments are visible in the sidebar, and can be added to the page if needed. With this capability in place, developing a custom story telling AEM component to support scenarios like the New York Times article becomes relatively simple.

If you want to explore the code, it is posted on Github as part of the https://github.com/razorfish/contentintelligence repository.

Conclusion

This post highlighted how an old problem can now be addressed in a new way. The text to speech service will make creating more immersive experiences easier and more cost effective. Amazon Polly’s support for 24 languages opens up new possibilities as well. Besides the 2 examples mentioned in this post, it could also support scenarios such as interactive kiosks, museum tour guides and other IOT, multichannel or mobile experiences.

With these turnkey machine-learning services in place, creative and innovative thinking is critical to successfully solve challenges in new ways.

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Digital Asset Management - SmartCrop

In previous articles I discussed the opportunities for machine learning in digital asset management (DAM), and, as a proof of concept, integrated a DAM solution (Adobe AEM DAM with with various AI/ML solutions from Amazon, IBM, Google and Microsoft.

The primary use case for that proof of concept was around auto-tagging assets in a digital asset management solution. Better metadata makes it easier for authors, editors, and other users of the DAM to search for assets, and in some scenarios, the DAM can providing asset recommendations to content authors based on metadata. For example, it’s often important to have a diverse mix of people portrayed on your site. With gender, age, and other metadata attributes as part of the image, diversity can be enforced using asset recommendation or asset usage reports within a DAM or content management system.

Besides object recognition, the various vendors also provide API’s for facial analysis. Amazon AI for example provides a face analysis API, and this post will show how we can tackle a different use case with that service.

SmartCrop

One common use case is the need for one image to be re-used at different sizes. A good example is the need for a small sized profile picture of the CEO for a company overview page as well as a larger version of that picture in the detailed bio page.

Challenge screenshot

Often, cropping at the center of an image often works fine, but it can also result in the wrong area being cropped. Resizing often distorts a picture, and ends up incorporating many irrelevant areas. A number of solutions are out there to deal with this problem, ranging from open source to proprietary vendors. All of them are leveraging different detection algorithms for identifying the area of interest in an image.

Results

Leveraging the Amazon Rekognition Face Analysis API, we can now solve this problem in a very simple way. Using the API, a bounding box for the face can be retrieved, indicating the boundaries of a face in that picture. And with that bounding box, the right area for cropping can be identified. After cropping, any additional resizing can be done with the most relevant area of the image to ensure the image is at the requested size.

Solution screenshot

The result is shown in the image above. The image to the right is the result of leveraging the SmartCrop functionality based on the Face Analysis API. As you can see, it is a significant improvement over the other options. Improvements to this SmartCrop could be done by adding additional margin, or incorporating some of the additional elements retrieved by the Face Analysis API.

The code for this proof of concept is posted on the Razorfish Github account, as part of the https://github.com/razorfish/contentintelligence repository. Obviously, in a real production scenario, additional optimizations should be performed to this proof of concept for overall performance reasons. The Amazon Rekognition API call only needs to take place once per image, and can potentially be done as part of the same auto-tagging workflow highlighted in previous posts, with the bounding box stored as an attribute with the image for later retrieval by the SmartCrop functionality. In addition, the output from the cropping can be cached at a CDN or webserver in front of Adobe AEM.

Conclusion

As this post highlights, old problems can be addressed now in new ways. In this case, it turns a task that often was performed manually in something that can be automated. The availability of many turnkey machine-learning services can provide a start to solve existing problems in a new and very simple manner. It will be interesting to see the developments in the coming year on this front.

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Digital Asset Management - Part III

In previous articles I discussed the opportunities for machine learning in digital asset management (DAM), and, as a proof of concept, integrated a DAM solution (Adobe AEM DAM with Google Cloud Vision. I followed up with a post on potential alternatives to Google’s Cloud Vision, including IBM Watson and Microsoft Cognitive Intelligence, and integrated those with Adobe DAM as well.

Of course, Amazon AWS couldn’t stay behind, and at the last AWS re:Invent conference, Amazon announced their set of Artificial Intelligence Services, including natural language understanding (NLU), automatic speech recognition (ASR), visual search and image recognition, text-to-speech (TTS). Obviously, it was now time to perform the integration with the AWS AI services in our proof of concept.

Amazon Rekognition

The first candidate for integration was Amazon Rekognition. Rekognition is a service that can detect objects, scenes, and faces in images and makes it easy to add image analysis to your applications. At this point, it offers 3 core services:

  • Object and scene detection - automatically labels objects, concepts and scenes
  • Facial analysis - analysis of facial attributes (e.g. emotion, gender, glasses, face bounding box)
  • Face comparison - compare faces to see how closely they match

Integration Approach

Google’s API was integrated using a SDK, the IBM and Microsoft API’s were integrated leveraging their standard REST interface. For the Amazon Rekognition integration, the SDK route was taken again, leveraging the AWS Java SDK. Once the SDK has been added to the project, the actual implementation becomes fairly straightforward.

Functionality

From a digital asset management perspective, the previous posts focused on auto-tagging assets to support a content migration process or improve manual efforts performed by DAM users.

The object & scene detection for auto-tagging functioned well with Amazon Rekognition. However, the labels returned are generalized. For example, a picture of the Eiffel tower will be labeled “Tower” instead of recognizing the specific object.

The facial analysis API returns a broad set of attributes, including the location of facial landmarks such as mouth and nose. But it also includes attributes such as emotions and gender, which can be used as tags. These can then be beneficial in digital asset management scenarios such as search and targeting.

Many of the attributes and labels returned by the Rekognition API included a confidence score, indicating the confidence around a certain object detection.

Results screenshot

In the proof of concept, a 75% cut off was used. From the example above, you can see that Female, Smile and Happy have been detected as facial attributes with a higher than 75% confidence.

Summary

The source code and setup instructions for the integration with AWS, as well as Google, Microsoft, and IBM’s solutions can be found on Github in the Razorfish repository.

One thing all the different vendors have in common is that the services are very developer focused, and integrating these services with an application is very straightforward. This makes adoption easy. Hopefully, the objects being recognized will become more detailed and advanced over time, which will improve their applicability even more.

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Content Management

In previous posts, I explored the opportunities for machine learning in digital asset management, and, as a proof-of-concept, integrated a DAM solution (Adobe AEM DAM) with a set of machine learning APIs.

But the scope of machine learning extends much further. Machine learning can also have a profoundly positive impact on content management.

In this context, machine learning is usually associated with content delivery. The technology can be used to deliver personalized or targeted content to website visitors and other content consumers. Although this is important, I believe there is another opportunity that stems from incorporating machine learning into the content creation process.

Questions Machine Learning Can Answer

During the content creation process, content can be analyzed by machine learning algorithms to help address some key questions:

  • How does content come across to readers? Which tones resonate the most? What writer is successful with which tone? Tone analysis can help answer that.
  • What topics are covered most frequently? How much duplication exists? Text clustering can help you analyze your overall content repository.
  • What is the article about? Summarization can extract relevant points and topics from a piece of text, potentially helping you create headlines.
  • What are the key topics covered in this article? You can use automatic topic extraction and text classification to create metadata for the article to make it more linkable and findable (both within the content management tool internally and via search engines externally).

Now that we know how machine learning can transform the content creation process, let’s take a look at a specific example of the technology in action.

An Implementation with AEM 6.2 Content Fragments and IBM Watson

Adobe just released a new version of Experience Manager, which I discussed in a previous post. One of the more important features of AEM 6.2 is the concept of “Content Fragments.”

In the past, content was often tied to pages. But Content Fragments allow you to create channel-neutral content with (possibly channel-specific) variations. You can then use these fragments when creating pages.

Content Fragments are treated as assets, which makes them great candidates for applying analysis and classification. Using machine learning, we’re able to analyze the tone of each piece of content. Tones can then be associated with specific pieces of content.

In the implementation, we used IBM Bluemix APIs to perform tone analysis. The Tone Analyzer computes emotional tone (joy, fear, sadness, disgust or anger), social tone (openness, conscientiousness, extraversion, agreeableness or emotional range) and language tone (analytical, confident or tentative).

The Tone Analyzer also provides insight on how the content is coming across to readers. For each sub-tone, it provides a score between 0 and 1. In our implementation, we associated a sub-tone with the metadata only if the score was 0.75 or higher.

The Results of Our Implementation

If you want to take a look, you’ll find the source code and setup instructions for the integration of Content Fragments with IBM Watson Bluemix over on GitHub.

We ran our implementation against the text of Steve Jobs’ 2005 Stanford commencement speech, and the results are shown below. For every sub-tone with a score of 0.75 or higher, a metadata tag was added.

Results

The Takeaway from Our Implementation

Machine learning provides a lot of opportunities during the content creation process. But it also requires authoring and editing tools that seamlessly integrate with these capabilities to really drive adoption and make the use of those machine-learning insights a common practice.

Personally, I can’t wait to analyze my own posts from the last few months using this technology. That analysis, combined with LinkedIn analytics, will enable me to explore how I can improve my own writing and make it more effective.

Isn’t that what every content creator wants?

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Digital Asset Management - Part II

A few weeks ago, I discussed the opportunities for machine learning in digital asset management (DAM), and, as a proof of concept, integrated a DAM solution (Adobe AEM DAM) with Google Cloud Vision, a newly released set of APIs for image recognition and classification.

Now, let’s explore some alternatives to Cloud Vision.

IBM Watson

To follow up, we integrated IBM’s offering. As part of BlueMix, IBM actually has two sets of APIs: the AlchemyAPI (acquired in March 2015) and the Visual Recognition API. The ability to train your own custom classifier in the Visual Recognition API is the key difference between the two.

There are a number of APIs within the AlchemyAPI, including a Face Detection/Recognition API and an Image Tagging API. The Face API includes celebrity detection and disambiguation of a particular celebrity (e.g., which Jason Alexander?).

Result sets can provide an age range for the identified people in the image. In a DAM scenario, getting a range instead of a number can be particularly helpful. The ability to create your own customer classifier could be very valuable with respect to creating accurate results for your specific domain. For example, you could create a trained model for your products, and organize your brand assets automatically against this. It would enable to further analyze usage and impact of these assets across new and different dimensions.

Leveraging the APIs was fairly straightforward. Similar to Google, adding the API to an application is simple; just build upon the sample API provided by IBM. It’s worth noting, however, that IBM’s API has a 1MB image size restrictions, somewhat lower than Google and Microsoft’s 4MB limit.

Microsoft Cognitive Services

Microsoft is interesting, especially considering they won the most recent ImageNet Large Scale Visual Recognition Challenge. As part of its Cognitive Services offering, Microsoft released a set of applicable Vision APIs (though they’re still in preview mode). For our purposes, the most relevant APIs are:

  • Computer Vision: This API incorporates an ability to analyze images and derive the appropriate tags with their confidence score. It can detect adult and racy content, and similar to Google’s Cloud Vision API, it has an Optical Character Recognition (OCR) capability that reads text in images. Besides tags, the API can provide English language descriptions of an image — written in complete sentences. It also supports the concepts of models. The first model is celebrity recognition, although we couldn’t get that one to work for straightforward celebrities like Barack Obama and Lionel Messi (it also doesn’t seem to work on the landing page).
  • Emotion: This API uses a facial expression in an image as an input. It returns the confidence level across a set of emotions for each face in relevant images.
  • Face: This API is particularly interesting, as it allows you to perform face recognition within a self-defined group. In a DAM scenario, this could be very relevant. For example, when all product images are shot with a small set of models, it can easily and more accurately classify each image with respect to various models. If an organization has contracts with a small set of celebrities for advertising prints, classification becomes that much more accurate.

The Microsoft APIs are dependent on each other in certain scenarios. For example, the Emotion API leverages the Face API to first identify faces within an image. Similarly, the Computer Vision API and Face API both identify gender and other attributes of people within an image.

Although Microsoft didn’t provide a sample Java API, the REST API is easy to incorporate. The source code and setup instructions for the integration with Google, Microsoft, and IBM’s solution can be found on Github.

Adobe Smart Tags

At the recent Adobe Summit conference, Adobe also announced the use of machine intelligence for smart tagging of assets as a beta capability of their new AEM 6.2 release. According to Adobe, it can automatically tag images with keywords based on:

  • Photo type (macro, portrait, etc.)
  • Popular activities (running, skiing, hiking, etc.)
  • Certain emotions (smiling, crying, etc.)
  • Popular objects (cars, roads, people, etc.)
  • Animals (dogs, cats, bears, etc.)
  • Popular locations (New York City, Paris, San Francisco, etc.)
  • Primary colors (red, blue, green)

There are even more categories for automatic classification, too.

Automatic Tagging Use Cases

In the previous post, I highlighted a couple of key use cases for tagging using machine intelligence in DAM. In particular, I highlighted how tagging can support the content migration process or improve manual efforts performed by DAM users.

Better metadata makes it easier for authors, editors, and other users of the DAM to find content during the creation process. It can also help in providing asset recommendations to content authors. For example, it’s often important to have a diverse mix of people portrayed on your site. With gender, age, and other metadata attributes as part of the image, diversity can be enforced using asset recommendation or asset usage reports within a DAM or content management system.

What’s more, this metadata can also help improve targeting and effectiveness of the actual end-user experience by:

  1. Allowing the image to be selected as targeted content
  2. Using the metadata in an image to ensure relevant ads, content, and assets are presented in context within an asset
  3. Informing site analytics by incorporating image metadata in click tracking and other measurement tools

In addition to these use cases, new scenarios are being created. Microsoft automatically generates captions your photos. Facebook is using machine intelligence to automatically assign alt text to photos uploaded to Facebook, and, in doing so, improve overall accessibility for Facebook users. Obviously, this type of functionality also will also enable Facebook and Microsoft to provide more targeted content and ads to users interacting with specific photos, a win-win. As metadata is used for end-user consumption in these cases, the unique challenge of really needing to support multilingual tagging and descriptions arises, with its own set of challenges.

With companies like Adobe, IBM, Google, and Microsoft pouring a ton of resources into machine learning, expect a lot of changes and improvements in the coming years. Relatively soon, computers will outperform humans in classification and analysis.

As it relates to Digital Asset Management, it remains to be seen precisely what the exact improvements will be. But one thing is certain: Machine learning technology promises a lot of exciting possibilities.

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Digital Asset Management

As the use and need for digital assets increase, so too does the cost and complexity of Digital asset management (DAM) — especially in a world where people are adopting devices with screens of all sizes (e.g., desktop, mobile, tablet, etc.).

DAM, however, is a challenge for many organizations. It still involves frequent manual labor, but machine learning is starting to change that.

Machine learning has already given us self-driving cars, speech recognition, effective web searches, and many other benefits over the past decade. But the technology can also play a role in classifying, categorizing, and managing assets in the years to come.

Machine learning can support DAM in areas such as face recognition, image classification, text detection, people recognition, and color analysis, among others. Google PlaNet, for example, can figure out where a photo was taken based on details embedded in it. Google Photos is using it to improve the search experience. Machine learning has already taken a role in image spam detection. Taken together, this all points to the need for DAM tools to start incorporating advanced machine-learning capabilities.

A Practical Test

Recently, Google released its Cloud Vision API. The Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine-learning models in an easy-to-use REST API. It quickly classifies images into thousands of categories (e.g., “sailboat”, “lion”, “Eiffel Tower”, etc.). It detects individual objects and faces within imagines. And it finds and reads printed words contained within images.

For Razorfish, this was a good reason to explore using the Vision API together with a DAM solution, Adobe AEM DAM. The result of the integration can be found on github.

Results screenshot

We leveraged text-detection capabilities, automation classification techniques, and the landmark detection functionality within Google’s API to automatically tag and assign other metadata to assets.

Benefits and Setbacks

Integrating the Vision API provided immediate benefits:

  • Automated text detection can help in extracting text from images, making them easily accessible through search.
  • Automated landmark detection helps in ensuring that the appropriate tags are set on digital assets.
  • Auto-classification can support browse scenarios for finding the right assets.

But there were also some shortcomings. For example, an image of a businesswoman in a white dress was identified as a bride. In other instances, the labels were vague or irrelevant. Though inconvenient, we expect these shortcomings to improve over time as the API improves.

Even with these drawbacks unaddressed, automated detection is still very valuable — particularly in a DAM scenario. Assigning metadata and tags to assets is usually a challenge, and automated tagging can address that. And since tags are used primarily in the authoring environment, false classifications can be manually ignored while appropriate classifications can help surface assets much broader.

The Evolution of DAM Systems

One frequent point in implementing DAM systems is asset migration. I have seen many clients with gigabytes of assets wonder whether to go through the tremendous effort of manually assigning metadata to them.

There’s a quick fix: Auto-classification techniques using machine learning will improve and speed up this process tremendously.

With the benefits around management and migration, machine learning and other intelligence tools will therefore start becoming a key component of DAM systems — similar to how machine learning is already impacting other areas.

Lastly, incorporating machine learning capabilities in DAM solutions will also have architectural implications. Machine intelligence functionality often uses a services-based architecture (similar to the APIs provided by Google) as it requires a significant or complex set of compute resources. As DAM systems start to incorporate them at its core, it will be more difficult for those solutions to support a classic on-premises approach — causing more and more solutions to migrate to a hosted software as-a-service (SaaS) model.

Bottom line? Consider incorporating machine learning into your DAM strategy now, and look at how it can be applied to your digital asset management process.

written by: Martin Jacobs (GVP, Technology)