jexp/csv_import_neo4j.adoc

How To: Neo4j Data Import - Minimal Example

We want to import data into Neo4j, there are too many resources with a lot of information which makes it confusing. Here is the minimal thing you need to know.

Imagine the data coming from the export of a relational or legacy system, just plain CSV files without headers (this time).

people.csv

1,"John"
10,"Jane"
234,"Fred"
4893,"Mark"
234943,"Anne"

friendships.csv

1,234
10,4893
234,1
4893,234943
234943,234
234943,1

Graph Model

Our graph Model would be very simple:

(p1:Person {userId:10, name:"Anne"})-[:KNOWS]->(p2:Person {userId:123,name:"John"})

Import with Neo4j Server & Cypher

Download, install and start Neo4j Server.
Open http://localhost:7474
Run the following statements one by one:

I used http-urls here to run this as an interactive, live Graph Gist.

CREATE CONSTRAINT ON (p:Person) ASSERT p.userId IS UNIQUE;

LOAD CSV FROM "https://gist.githubusercontent.com/jexp/d8f251a948f5df83473a/raw/people.csv" AS row
CREATE (:Person {userId: toInt(row[0]), name:row[1]});

USING PERIODIC COMMIT
LOAD CSV FROM "https://gist.githubusercontent.com/jexp/d8f251a948f5df83473a/raw/friendships.csv" AS row
MATCH (p1:Person {userId: toInt(row[0])}), (p2:Person {userId: toInt(row[1])})
CREATE (p1)-[:KNOWS]->(p2);

Note	You can also use file-urls. Best with absolute paths like `file:/path/to/data.csv`, on Windows use: `file:c:/path/to/data.csv`

If you want to find your people not only by id but also by name quickly, also run:

CREATE INDEX ON :Person(name);

For instance all second degree friends of "Anne" and on how many ways they can be reached.

MATCH (:Person {name:"Anne"})-[:KNOWS*2..2]-(p2)
RETURN p2.name, count(*) as freq
ORDER BY freq DESC;

Bulk Data Import

For tens of millions up to billions of rows.

Shutdown the server first!!

Create two additional header files:

people_header.csv

userId:ID,name

friendships_header.csv

:START_ID,:END_ID

Execute from the terminal:

path/to/neo/bin/neo4j-import --into path/to/neo/data/graph.db  \
--nodes:Person people_header.csv,people.csv --relationships:KNOWS friendships_header.csv,friendships.csv

After starting your database again, run:

CREATE CONSTRAINT ON (p:Person) ASSERT p.userId IS UNIQUE;

Resources

	1	234
	10	4893
	234	1
	4893	234943
	234943	234
	234943	1

	1	John
	10	Jane
	234	Fred
	4893	Mark
	234943	Anne