CSV vs JSON: Choosing the Right Data Format
CSV is compact and tabular; JSON is structured and nested. Learn the trade-offs in size, types, tooling, and streaming so you can pick the right format for exports, APIs, and data pipelines.
CSV and JSON are the two formats you reach for most often when moving structured data between systems. They solve overlapping problems but make opposite trade-offs. CSV optimizes for compact, tabular, human-editable data. JSON optimizes for nested structure, explicit types, and direct mapping to programming-language objects. Choosing well saves you from awkward conversions and data-loss bugs later.
CSV: Rows, Columns, and Nothing Else
Comma-Separated Values is a flat, two-dimensional format: a header row of column names followed by data rows. It is informally standardized by RFC 4180, which defines how to quote fields containing commas, how to escape double quotes (by doubling them), and how to handle line breaks inside quoted fields. Because it is plain text with minimal syntax, CSV opens in any spreadsheet and is trivial to generate.
The cost of that simplicity is that CSV has no concept of data types. Every value is a string. The cell 007 might be a string identifier or a number that lost its leading zeros depending on how the reader interprets it. There is no native way to represent null versus empty string, nested objects, or arrays. Spreadsheet software frequently mangles CSV by auto-converting values that look like dates or numbers — a notorious source of corrupted gene names and broken phone numbers.
JSON: Structure and Types Built In
JavaScript Object Notation represents data as nested objects and arrays with four scalar types: string, number, boolean, and null. This maps cleanly to the data structures of nearly every programming language, which is why JSON dominates web APIs. A single JSON document can express a customer with an address object, a list of orders, and each order containing a list of line items — structure that CSV can only fake with conventions like dotted column names.
{
"id": 42,
"name": "Ada Lovelace",
"active": true,
"roles": ["admin", "editor"],
"address": { "city": "London", "zip": "W1" }
}The price of that expressiveness is verbosity and size. JSON repeats every key on every record, so a 10,000-row dataset repeats each column name 10,000 times. The same data as CSV stores each column name exactly once in the header. For large, purely tabular datasets, CSV is often three to five times smaller before compression.
Size, Speed, and Streaming
- Size: CSV wins for flat tabular data; JSON wins when the data is genuinely nested and CSV would need denormalization.
- Parse speed: CSV is faster to parse character-for-character, but JSON parsers are heavily optimized and the gap is small for typical payloads.
- Streaming: CSV streams naturally line by line, ideal for multi-gigabyte exports. Standard JSON must be read as a whole document; for streaming you need JSON Lines (one JSON object per line), which combines JSON's typing with CSV's line-oriented streaming.
- Tooling: business users live in spreadsheets, so CSV is the safer hand-off; developers and APIs prefer JSON.
When to Use Which
Use CSV for spreadsheet exports, bulk imports into databases, analytics dumps, and any flat dataset a non-developer will open. Use JSON for API requests and responses, configuration, and any data with nested or optional structure. When you control both ends and need both streaming and typing, JSON Lines is the pragmatic middle ground used by log pipelines and machine-learning datasets.