Skip to content

Instantly share code, notes, and snippets.

@jexp
Last active October 14, 2015 05:37
Show Gist options
  • Save jexp/d8f251a948f5df83473a to your computer and use it in GitHub Desktop.
Save jexp/d8f251a948f5df83473a to your computer and use it in GitHub Desktop.

How To: Neo4j Data Import - Minimal Example

We want to import data into Neo4j, there are too many resources with a lot of information which makes it confusing. Here is the minimal thing you need to know.

Imagine the data coming from the export of a relational or legacy system, just plain CSV files without headers (this time).

people.csv
1,"John"
10,"Jane"
234,"Fred"
4893,"Mark"
234943,"Anne"
friendships.csv
1,234
10,4893
234,1
4893,234943
234943,234
234943,1

Graph Model

Our graph Model would be very simple:

import data model
(p1:Person {userId:10, name:"Anne"})-[:KNOWS]->(p2:Person {userId:123,name:"John"})

Import with Neo4j Server & Cypher

  1. Download, install and start Neo4j Server.

  2. Open http://localhost:7474

  3. Run the following statements one by one:

I used http-urls here to run this as an interactive, live Graph Gist.

CREATE CONSTRAINT ON (p:Person) ASSERT p.userId IS UNIQUE;
LOAD CSV FROM "https://gist.githubusercontent.com/jexp/d8f251a948f5df83473a/raw/people.csv" AS row
CREATE (:Person {userId: toInt(row[0]), name:row[1]});
USING PERIODIC COMMIT
LOAD CSV FROM "https://gist.githubusercontent.com/jexp/d8f251a948f5df83473a/raw/friendships.csv" AS row
MATCH (p1:Person {userId: toInt(row[0])}), (p2:Person {userId: toInt(row[1])})
CREATE (p1)-[:KNOWS]->(p2);
Note
You can also use file-urls. Best with absolute paths like file:/path/to/data.csv, on Windows use: file:c:/path/to/data.csv

If you want to find your people not only by id but also by name quickly, also run:

CREATE INDEX ON :Person(name);

For instance all second degree friends of "Anne" and on how many ways they can be reached.

MATCH (:Person {name:"Anne"})-[:KNOWS*2..2]-(p2)
RETURN p2.name, count(*) as freq
ORDER BY freq DESC;

Bulk Data Import

For tens of millions up to billions of rows.

Shutdown the server first!!

Create two additional header files:

people_header.csv
userId:ID,name
friendships_header.csv
:START_ID,:END_ID

Execute from the terminal:

path/to/neo/bin/neo4j-import --into path/to/neo/data/graph.db  \
--nodes:Person people_header.csv,people.csv --relationships:KNOWS friendships_header.csv,friendships.csv

After starting your database again, run:

CREATE CONSTRAINT ON (p:Person) ASSERT p.userId IS UNIQUE;
1 234
10 4893
234 1
4893 234943
234943 234
234943 1
1 John
10 Jane
234 Fred
4893 Mark
234943 Anne
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment