How Machine Learning Can Transform Digital Asset Management - SmartCrop

In previous articles I discussed the opportunities for machine learning in digital asset management (DAM), and, as a proof of concept, integrated a DAM solution (Adobe AEM DAM with with various AI/ML solutions from Amazon, IBM, Google and Microsoft.

The primary use case for that proof of concept was around auto-tagging assets in a digital asset management solution. Better metadata makes it easier for authors, editors, and other users of the DAM to search for assets, and in some scenarios, the DAM can providing asset recommendations to content authors based on metadata. For example, it’s often important to have a diverse mix of people portrayed on your site. With gender, age, and other metadata attributes as part of the image, diversity can be enforced using asset recommendation or asset usage reports within a DAM or content management system.

Besides object recognition, the various vendors also provide API’s for facial analysis. Amazon AI for example provides a face analysis API, and this post will show how we can tackle a different use case with that service.

SmartCrop

One common use case is the need for one image to be re-used at different sizes. A good example is the need for a small sized profile picture of the CEO for a company overview page as well as a larger version of that picture in the detailed bio page.

Challenge screenshot

Often, cropping at the center of an image often works fine, but it can also result in the wrong area being cropped. Resizing often distorts a picture, and ends up incorporating many irrelevant areas. A number of solutions are out there to deal with this problem, ranging from open source to proprietary vendors. All of them are leveraging different detection algorithms for identifying the area of interest in an image.

Results

Leveraging the Amazon Rekognition Face Analysis API, we can now solve this problem in a very simple way. Using the API, a bounding box for the face can be retrieved, indicating the boundaries of a face in that picture. And with that bounding box, the right area for cropping can be identified. After cropping, any additional resizing can be done with the most relevant area of the image to ensure the image is at the requested size.

Solution screenshot

The result is shown in the image above. The image to the right is the result of leveraging the SmartCrop functionality based on the Face Analysis API. As you can see, it is a significant improvement over the other options. Improvements to this SmartCrop could be done by adding additional margin, or incorporating some of the additional elements retrieved by the Face Analysis API.

The code for this proof of concept is posted on the Razorfish Github account, as part of the https://github.com/razorfish/contentintelligence repository. Obviously, in a real production scenario, additional optimizations should be performed to this proof of concept for overall performance reasons. The Amazon Rekognition API call only needs to take place once per image, and can potentially be done as part of the same auto-tagging workflow highlighted in previous posts, with the bounding box stored as an attribute with the image for later retrieval by the SmartCrop functionality. In addition, the output from the cropping can be cached at a CDN or webserver in front of Adobe AEM.

Conclusion

As this post highlights, old problems can be addressed now in new ways. In this case, it turns a task that often was performed manually in something that can be automated. The availability of many turnkey machine-learning services can provide a start to solve existing problems in a new and very simple manner. It will be interesting to see the developments in the coming year on this front.

written by: Martin Jacobs (GVP, Technology)

How Machine Learning Can Transform Digital Asset Management - Part III

In previous articles I discussed the opportunities for machine learning in digital asset management (DAM), and, as a proof of concept, integrated a DAM solution (Adobe AEM DAM with Google Cloud Vision. I followed up with a post on potential alternatives to Google’s Cloud Vision, including IBM Watson and Microsoft Cognitive Intelligence, and integrated those with Adobe DAM as well.

Of course, Amazon AWS couldn’t stay behind, and at the last AWS re:Invent conference, Amazon announced their set of Artificial Intelligence Services, including natural language understanding (NLU), automatic speech recognition (ASR), visual search and image recognition, text-to-speech (TTS). Obviously, it was now time to perform the integration with the AWS AI services in our proof of concept.

Amazon Rekognition

The first candidate for integration was Amazon Rekognition. Rekognition is a service that can detect objects, scenes, and faces in images and makes it easy to add image analysis to your applications. At this point, it offers 3 core services:

  • Object and scene detection - automatically labels objects, concepts and scenes
  • Facial analysis - analysis of facial attributes (e.g. emotion, gender, glasses, face bounding box)
  • Face comparison - compare faces to see how closely they match

Integration Approach

Google’s API was integrated using a SDK, the IBM and Microsoft API’s were integrated leveraging their standard REST interface. For the Amazon Rekognition integration, the SDK route was taken again, leveraging the AWS Java SDK. Once the SDK has been added to the project, the actual implementation becomes fairly straightforward.

Functionality

From a digital asset management perspective, the previous posts focused on auto-tagging assets to support a content migration process or improve manual efforts performed by DAM users.

The object & scene detection for auto-tagging functioned well with Amazon Rekognition. However, the labels returned are generalized. For example, a picture of the Eiffel tower will be labeled “Tower” instead of recognizing the specific object.

The facial analysis API returns a broad set of attributes, including the location of facial landmarks such as mouth and nose. But it also includes attributes such as emotions and gender, which can be used as tags. These can then be beneficial in digital asset management scenarios such as search and targeting.

Many of the attributes and labels returned by the Rekognition API included a confidence score, indicating the confidence around a certain object detection.

Results screenshot

In the proof of concept, a 75% cut off was used. From the example above, you can see that Female, Smile and Happy have been detected as facial attributes with a higher than 75% confidence.

Summary

The source code and setup instructions for the integration with AWS, as well as Google, Microsoft, and IBM’s solutions can be found on Github in the Razorfish repository.

One thing all the different vendors have in common is that the services are very developer focused, and integrating these services with an application is very straightforward. This makes adoption easy. Hopefully, the objects being recognized will become more detailed and advanced over time, which will improve their applicability even more.

written by: Martin Jacobs (GVP, Technology)