#Domain This document outlines the evoluation of modeling recommendations using Neo4j.
Neumann harvests data records generated by Tapirus, into the Neo4j data store to process recommendations. For that purpose, only certain events and fields are imported.
##Modelling
###First Iteration: Working Design The first attempt at doing recommendations with neo4j was based a simple data model:
(user :`User`)-[:{Event}]->(item :`Item`)Events: VIEW, BUY
Events in the database have the following properties: timestamp, session.
With this model, we were able to:
- Find out which products are 'liked' (viewed/bought) by people based on other products they view/buy
- Group out other liked products by session
- Find the most purchased product
But, it also have limitations:
- Grouping events by session is not efficient (relationship property)
- It's super expensive to sort events by timestamp
###Second Iteration: Faster Traversal The attempt was made to address limitations of the first. Namely, the efficient grouping of events.
(user :`User`)-[:HAS]->(session :`Session` {timestamp})
(session :`Session`)-[:{Event}]->(item :`Item`)
(session :`Session`)-[:IN]->(agent :`Agent`)This this model, we are able to:
- Find out which products are 'liked' (viewed/bought) by people based on other products they view/buy
- Find out which product people view or purchase together, more efficiently (same session)
- Find the most purchased product
- Sort sessions by timestamp a little more efficiently
- We now record the web browser id (Agent)
But, it still has limitations:
- Although sessions can be sorted, events within them can't, so tracking the latest product views is still super expensive
Instead of remodeling the database to address the limitation of finding the latest items, we opted for using a different data store that would process analytics and track the latest events. In this sense, we can make use of neo4j to query recommendations on implicit collaborative filtering data.
##Recommending
There are 2 primary modes of finding items to recommend, based on the current model.
I. Same session search When we search for items purchased within the same session, we're essentially looking at one shopping basket or cart.
MATCH (s :`TenantId` :`Session`)-[:{Event}]->(i :`TenantId` :`Item` {id : {itemId} })
WITH s, i
MATCH (s)-[:{Event}]->(x :`Item` :`TenantId`)
WHERE x <> i
RETURN x.id AS item, COUNT(x) AS n
ORDER BY n DESC
LIMIT {limit}This query searches for all other items viewed/purchased in each session where our target item was viewed/purchased,
and sorts them by their count, returning the top limit items.
For items with many connections this can run a little slower than for those with less, though.
II. Different session
Looking at the same session gives us products bundled together in a purchase, or viewed in the same session. Users, however, can make different purchases at different times. When we're interested in these purchases, need to widen our net to other sessions. Since our first method already covers items in the same session, we look exclusively at other sessions, excluding every session where the item was actually viewed or bought.
MATCH (i :`TenantId` :`Item` { id: {itemId} })<-[r1 :{Event}]-(s1 :`TenantId` :`Session`)\
-[:`BY`]->(u :`TenantId` :`User`)<-[:`BY`]-(s2 :`TenantId` :`Session`)-[:{ACTION}]\
->(x :`TenantId` :`Item`)
WHERE i <> x AND s1 <> s2
WITH x
LIMIT {volume}
RETURN x.id AS item, COUNT(x) AS n
ORDER BY n DESC
LIMIT {limit}In this query, we limit the amount the amount of items we search using a parameter volumne,
and return the top limit items.
##Sorting & Filtering
Results can also be filtered based parameter values. For example, we can limit the items of interest
that have a price above 45, or in between 10 and 60, e.g.:
WHERE i <> x AND x.price > 10 AND x.price < 60Note: This isn't implemented yet