We played around with the most popular ones to compare the ease of use, speed of development, and performance to help you make a better educated decision.
Table of Contents
We find the performance more important than ease of use as it will affect the users of your application as there’s nothing you can do about it. Not everyone needs quoted CSV support, as non-quoted files are simpler to work with and are used in data exchange more often. That’s why we decided to run separate tests for these scenarios, varying rows, and columns count.
The benchmarks start with relatively small files with ten columns and 10k rows that a little bit more than 1Mb in size,
and move to quite a sizeable 140Mb file with 100 columns and 100k rows.
We’re increasing rows and columns separately to see what affects performance the most.
The correctness of parsing is verified with a simple
The source code of our tests is available on github.
Non-Quoted CSV Parser Benchmarks
We’ve decided to add a baseline
String.split test here.
It just loads the whole file in memory, splits it into lines, and then breaks every line to separate column values.
It’s is one of the fastest ways to parse non-quoted CSV files. Only scanning the data once while manually splitting everything can beat that.
PapaParse was running in fast mode for these tests. You should be using it if you’re sure your file doesn’t have quoted data. The parser can switch to this mode automatically depending on the data, but we’ll save it some effort.
Dekkai was crashing on 100k rows tests, bringing the whole Node process down with some WASM errors. Although the idea is interesting, it’s still in development, and we’re not recommending it for production use.
PapaParse was the fastest, even beating
String.split. Dekkai comes second but fails to pass all tests.
CSV-parser and csv-parse share third place. And frankly, fast-csv is the slowest.
Quoted CSV Parser Benchmarks
All parsers were much slower on quoted tests. While it’s a mere 20% for most parsers, PapaParse was 2x slower with fast-mode disabled. Still, it remains the fastest of all we’ve tested.
Dekkai couldn’t handle 100k tests again, so we had to disable them.
- Extremely easy to use, just
- Can parse strings, local files, and even download remote files
- Fast mode for non-quoted data
- Streaming support for parsing large files, including NodeJS Readable stream
- Async parsing in a worker thread
- Automatic type conversion
- Has sync, stream, and callback APIs
- Has callbacks to transform column headers and values
- Crazy fast
- Insanely popular
- Fully RFC 4180 compliant and correctly handles line-breaks and quotations
- Can also format CSV from arrays
- Bundle size is only 6.8k gzipped with no dependencies
CSV-Parse is part of the CSV module. It’s even more popular than PapaParse with 1.4M weekly downloads. The package was first released in 2010 and robust enough to be used against big datasets by a large community. It’s slower than PapaParse, though.
- Implements Node.js stream.Transform API
- Has simple callback API
parse(input, options, callback)
- Streaming support for large datasets
- Extensive test coverage
- Has sync, stream, callback, and async iterators APIs
- Only 6.3k gzipped with no dependencies
- Has companion CSV modules to transform and write CSV streams
CSV-Parser is quite fast RFC 4180 compliant parser. Immensely popular with 400k weekly downloads. It’s pretty minimalistic with only streaming API and crazy small bundle size. Although it aims for maximum speed, the performance is on-par with csv-parse and slower than PapaParse.
- Streaming support
- Has callback functions to modify column headers and values
- Async API with neat-csv wrapper
- The lightest parser with only 1.5k gzipped
Fast-CSV combines packages to format and parse CSV files. The name suggests it must be quite fast, but it’s the slowest among all parsers in this comparison. Still, it’s pretty popular with 640k weekly downloads.
- Streaming support
- Has callbacks to modify the headers and transform rows
- Has validation support
Dekkai is crazy-fast multithreaded CSV parser build on Web Assembly. It’s still in pre-release, and we were running into some issues with parsing large files, probably due to the memory consumption when loading everything in RAM. WASM is quite heavy, and you pay for it with a 24k gzipped bundle size. Unlike all other parsers that return an array of arrays or array of objects, it produces its custom objects for tables and rows. It’s inconvenient, and you probably should convert them to make your logic independent of the parser code.
- Automatic type conversion in binary mode
- Iterative mode that only iterates over data and invokes a callback for each row
- Should be quite fast for parallel parsing and big files
Nowadays, most web applications communicate in JSON. It’s crazy fast, thanks to built-in browser support and heavy optimizations.
While it might work well even for tabular data, CSV is still widespread and easier to load into spreadsheets for manual analysis. Raw tabular JSON is quite significant in size, and while gzip compression can eliminate that factor, it is still slower to compress and uncompress heavy data streams.
With various open-source libraries available in the NPM, parsing CSV files is not a big deal. We suggest PapaParse for most applications. It’s easy to use, powerful, and incredibly fast at the same time. You won’t run into trouble choosing it as your CSV parser.
Finally, for specific use cases, you might still roll your own CSV parser. For example, in one of the BI analytics applications we’ve developed, we started with PapaParse and switched to a custom one-pass crazy-fast implementation tuned for our use cases later on cause parsing huge CSVs was a bottleneck.
Need help with your data analytics application? Our extensive expertise in this field might save you time and money. Just contact us using the form below.