Skip to content

Instantly share code, notes, and snippets.

@stefanschmidt
Created May 15, 2023 17:56
Show Gist options
  • Save stefanschmidt/08be33d8bb98a1e34c660bf7baec2e4c to your computer and use it in GitHub Desktop.
Save stefanschmidt/08be33d8bb98a1e34c660bf7baec2e4c to your computer and use it in GitHub Desktop.
Reading a CSV file with Python

Reading a CSV file with Python

For anything < 50 MB

import csv

with open("data.csv") as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    csv_data = [row for row in reader]

For larger files

import pandas as pd

df = pd.read_csv('data.csv')

Runtime comparison

10 MB

$ time python python_readcsv.py

real	0m0.457s
user	0m0.303s
sys	0m0.130s

$ time python pandas_readcsv.py

real	0m0.866s
user	0m0.811s
sys	0m0.226s

500 MB

$ time python python_readcsv.py

real	0m20.084s
user	0m18.146s
sys	0m1.682s

$ time python pandas_readcsv.py

real	0m9.598s
user	0m8.255s
sys	0m1.084s

Create CSV file with random data

import random
import uuid
outfile = 'data.csv'
outsize = 1024 * 1024 * 500 # MB
with open(outfile, 'w') as csvfile:
    size = 0
    while size < outsize:
        txt = '%s,%.6f,%.6f,%i\n' % (uuid.uuid4(), random.random()*50, random.random()*50, random.randrange(1000))
        size += len(txt)
        csvfile.write(txt)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment