9

I want to add a recommendation feature to a document management system. It is a server on which most company documents are stored. Employees browse the web interface and click to download (or read online) the documents they want.
Each employee only has access to a subset of all documents:

Employees only have access to a subset of all documents

My goal: Recommend to an employee the documents recently opened by their teammates, or the spreadsheet that serves as an annex to the document they just opened, or anything they might want to read.

There are many recommendation engines for publicly available data (all Netflix users can see all movies), but the situation here is special: Each employee only has the permission to a fraction of all documents, whereas in Netflix any user has access to all movies.

Example: Employee1 can read DocumentA but not DocumentB. Employee2 can read both and Employee3 cannot read any.

Of course, I must not recommend to an employee documents to which she/he does not have access. Furthermore, I guess I should consider the popularity of a document only in the context of the employees who have access to it. To make things even more complicated, employees sometimes move from a project to another, which impacts the documents to which they have access.

  • Is there a name for this kind of problem?
  • Can it be reduced without precision/efficiency loss to a more common kind of problem?
  • If not, what approach would work well for this kind of problem?

Note: A Netflix-like recommendation engine is not good enough. A document with 50 views should be prominent if only 10 employees (including me) have access to it, but not prominent if 100000 employees have access to it.

In case it is needed, here are a few data specifics: The average company has 1000 employees, about 10000 documents, an employee clicks about 5 documents per day. Each project has an average of 10 employees having access to it, and has about 100 documents. Each employee works on an average of 5 projects in parallel.

Nicolas Raoul
  • 345
  • 2
  • 12

3 Answers3

1

I feel that you need to address two things separately.

First, You need to have access control for the users in your system. You can have access tokens attached to each user and file. Filter the files database before you process.

Second, Ranking the documents I would suggest have some weight for a document weight and user weight which relative to the current browsing user.

For example I can think of document weight and user weight as follows but they can be much more complex as per your system-

DocumentWeight = Number of Views/ Number of Users can Access
UserWeight = ## Relative to browsing user- Users in similar project will have higher weights

DocumentScore = Sum over all viewed users{DocumentWeight x UserWeight}

You can rank the documents, this will statistically pull up the documents you need. I hope this will be some help.

Anwar Shaikh
  • 267
  • 3
  • 12
0

From your description, I would suggest that you look towards methods called collaborative filtering. Basically, you could treat any view/download of a document as a positive feedback for some item and then recommend such items for users looking into similar documents.

Filtering of hidden results should be done on per-user basis (you find all possible suggestions, but output only those that user can rights to see).

chewpakabra
  • 779
  • 4
  • 13
0

Take a look into Mining of Massive Data Sets pp. 328 which eventually will lead you to SVD that is commonly used in recommender systems.

Drey
  • 191
  • 3