Skip to content

Instantly share code, notes, and snippets.

@pdurbin
Last active August 29, 2015 14:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pdurbin/4d27fea7b431ef3bf4f9 to your computer and use it in GitHub Desktop.
Save pdurbin/4d27fea7b431ef3bf4f9 to your computer and use it in GitHub Desktop.
Solr JOIN

Solr JOIN

Solr JOINs are a way to enforce document security, as explained by Yonik Seeley at http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

This repository contains an example of a working Solr JOIN based on data in before.json. Permissions per user are embedded in the primary documents like this:

{
    "id": "dataset_3", 
    "perms_ss": [
        "alice", 
        "bob"
    ]
}, 
{
    "id": "dataset_4", 
    "perms_ss": [
        "alice", 
        "bob", 
        "public"
    ]
}, 

User document have been created to do the JOIN on:

{
    "id": "alice",
    "groups_s": "alice" 
}, 

The JOIN looks like this:

{!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

Because indexing the primary documents (datasets) takes a while, I'm interested in exploring the idea of introducing a third type of document that contains the permission information. after.json is an example, with documents that look like this:

{
    "id": "dataset_3" 
}, 
{
    "id": "dataset_4" 
}, 
{
    "id": "public",
    "groups_s": "public" 
}, 
{
    "id": "alice",
    "groups_s": "alice" 
}, 
{
    "id": "bob",
    "groups_s": "bob" 
}, 
{
    "id": "charlie",
    "groups_s": "charlie" 
},
{
    "id": "dataset_1_perms",
    "definition_point_s": "dataset_1",
    "role_assignee_ss": [
        "alice" 
    ]
},
{
    "id": "dataset_2_perms",
    "definition_point_s": "dataset_2",
    "role_assignee_ss": [
        "bob"
    ]
},

The question is if it's possible to construct a Solr JOIN such that the same permissions are enforced and the same documents are returned per user. This repo contains expected output and test runners for anyone who can figure out the syntax of the JOIN.

[
{
"id": "dataset_1"
},
{
"id": "dataset_2"
},
{
"id": "dataset_3"
},
{
"id": "dataset_4"
},
{
"id": "dataset_1_perms",
"definition_point_s": "dataset_1",
"role_assignee_ss": [
"alice"
]
},
{
"id": "dataset_2_perms",
"definition_point_s": "dataset_2",
"role_assignee_ss": [
"bob"
]
},
{
"id": "dataset_3_perms",
"definition_point_s": "dataset_3",
"role_assignee_ss": [
"alice",
"bob"
]
},
{
"id": "dataset_4_perms",
"definition_point_s": "dataset_4",
"role_assignee_ss": [
"alice",
"bob",
"public"
]
}
]
{
"id": "dataset_1"
}
{
"id": "dataset_3"
}
{
"id": "dataset_4"
}
[
{
"id": "dataset_1",
"perms_ss": [
"alice"
]
},
{
"id": "dataset_2",
"perms_ss": [
"bob"
]
},
{
"id": "dataset_3",
"perms_ss": [
"alice",
"bob"
]
},
{
"id": "dataset_4",
"perms_ss": [
"alice",
"bob",
"public"
]
},
{
"id": "public",
"groups_s": "public"
},
{
"id": "alice",
"groups_s": "alice"
},
{
"id": "bob",
"groups_s": "bob"
},
{
"id": "charlie",
"groups_s": "charlie"
}
]
{
"id": "dataset_2"
}
{
"id": "dataset_3"
}
{
"id": "dataset_4"
}
{
"id": "dataset_4"
}
#!/bin/sh
curl http://localhost:8983/solr/update/json?commit=true -H "Content-type: application/json" -X POST -d "{\"delete\": { \"query\":\"*:*\"}}"
#!/bin/sh
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @after.json -H 'Content-type:application/json'
#!/bin/sh
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @before.json -H 'Content-type:application/json'
#!/bin/bash
diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+alice)' | jq '.response.docs[] | {id}') alice.expected
#!/bin/bash
./test.after.alice
./test.after.bob
./test.after.charlie
#!/bin/bash
diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+bob)' | jq '.response.docs[] | {id}') bob.expected
#!/bin/bash
diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+charlie)' | jq '.response.docs[] | {id}') charlie.expected
#!/bin/bash
diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice)' | jq '.response.docs[] | {id}') alice.expected
#!/bin/bash
./test.before.alice
./test.before.bob
./test.before.charlie
#!/bin/bash
diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:bob)' | jq '.response.docs[] | {id}') bob.expected
#!/bin/bash
diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:charlie)' | jq '.response.docs[] | {id}') charlie.expected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment