News delivery on the Internet is optimized for a browsing behavior rather than exploring topics at depth. Since search engines are designed for ad delivery, there is an emphasis on driving users to multiple sites to enable maximum exposure to advertising. As a consequence customers of a traditional news organization that produces 100s, or even thousands of unique in-depth articles every day, end up not seeing content that might be of interest to them, but which is not brought to their attention.
The product is a content delivery service that provides a customized experience which learns from user behavior. The content is drawn from news articles, social media, deeper long-form content, and reputable knowledge bases, with a satisfying experience of consuming in-depth articles targeted to their interests.
Assessment
The assessment was done in close collaboration with the key stakeholders from the client. The result of the assessment was as follows:
The client’s data sets are largely drawn from public sources augmented with feeds from domain-specific data providers and there was poor differentiation with other products.
There is limited in-house expertise in Data Pipeline design and AI modeling. Critical pieces had been built by external contractors without a cohesive data architecture. A deeper signal extraction exercise had not been done and the potential for behavior-driven personalization had not been investigated.
The assessment results are summarized below.
AI Business Problem and Value
The customer is a publisher that has built its reputation on in-depth coverage of various topics and plans to launch a product that delivers the in-depth articles to users based on their browsing patterns.
The product will support customized content delivery both for browsing and for reading in-depth content. The general solution will be applicable in many focus areas such as Healthcare, Sports, Hobbies, and Entertainment.
Market research has demonstrated a need for a product that is able to serve up relevant long-form, detailed, and trusted content to the user as a one stop information portal for end users.
AI Use Cases
The use cases selected were as follows:
Initial Experience
The user sees a set of articles that are of general interest and is guided to walk through a process of configuration.
Configuration
- The user chooses topic(s) of interest from a curated list of topics.
- The user can search for keywords, see recommended keywords, and sample articles associated with those keywords in order to guide the personalization.
- The user has the ability to bookmark interesting articles, which would also guide the personalization engine.
Product Experience
On visiting their home page, the user sees relevant news items, and social media posts.
As they browse through news items, it is possible to star articles to emphasize their importance. The recommendation engine also learns from the topics of the browsed articles, and the amount of time spent in each article. In-depth articles are more influential in guiding the recommendation engine than news articles.
All use cases above assume that users can select the following in their profile to help tune the recommendation engine. However most users would be expected to have a satisfying experience just by engaging with the product without deeper tuning.
Prediction Variables
After identifying the key use cases, we identified the key prediction variables that would be generated by a data-driven ML system.
The prediction variables were as follows:
Articles
Given a set of keywords predict which articles most closely match them.
Similar articles
Given a set of articles predict which articles most closely match it.
Topics
What keywords are the best representation of the “topics” implicit in an article.
Keywords
Given a keyword what other keywords match it most closely.
Modeling Approach
In collaboration with the client’s technical team, we ran a comprehensive set of experiments on data sets collected from the prototypes of the digital press. We conducted a literature review and identified a set of techniques that appeared promising.
After an extensive study of literature including latest trends in Natural Language Processing, or NLP, we tested various NLP text affinity approaches to build a custom recommendation engine.
We iterated on the available data sets to compare different available approaches to identify the algorithms that performed the best and the results of which tested well with targeted users. In the evaluation phase we looked at the available cloud APIs from various vendors and settled on building custom predictive models that performed a lot better.
Data Strategy
We quantified the available data sets in terms of their daily volume and diversity of sources.
We also did a survey of available third party datasets that had the potential to increase the appeal of the product to the targeted users. The datasets for consideration included syndicated feeds from leading media providers, content scrapes from public-facing pages of representative news outlets, selected categories from Wikimedia and knowledge bases such as WebMD.
We also examined the value of storing clickstream data from users in order to guide their experience and recommended an architecture for a secure management of personal data, which would be saved in a separate datastore.
Results
The client accepted the findings and rolled out the ML server in a product that was released in a private beta and is now available to its select clients for a full release testing.
Key results were:
Leadership team and AI readiness
We identified team gap expertise and presented an internal training and external consultancy plan.
Data pipeline, algorithms and models
Signal data experimentation identified data intensity and value. A data architecture was prototyped and presented to the client.
AI architecture and deployment
We built an AI MVP to validate with beta customers and to estimate the cloud resources needed to deliver the desired experience.
Algorithms and Models
We presented an in depth survey of available approaches to the client. We then ran a number of signal extraction experiments on the most promising approaches.
Product, Category, Value Prop and Competitive
While the value proposition was extremely strong, the current search market is well served. Further effort is required in order to define a new market category to ensure success.