Excel and CSV files

Update: A second part of this post is available here.

CSV, or Comma-Separated Values, it is an old file format still used because of it's simplicity. Basically, you simulate colums by using commas and rows by using newlines. The full specification can be read at the Wikipedia.

One of the reasons of being so widely used is Microsoft Excel's capabilities of quickly reading and transforming them into full spreadsheets (very useful for non-technical customers!).

If a CSV field has already commas or newlines, you can put it's contents into double quotes (") to make a CSV parser not split the content into different columns or even rows.

Here it is a small sample code in C# that "safe writes" a CSV field:

public string WriteCSVItem(object Item, bool Quote)
{
int index
;
string
results = string.Empty
;

if
(Item == null
)
return string.Empty
;
else
results = Item.ToString()
;

index = results.IndexOf("\"")
;
while (index > -1)
{
results
= results.Substring(0, index) + "\"" + results.Substring(index)
;
if
(index+2 <= results.Length)
index
= results.IndexOf("\"", index+2)
;
else
index = -1;
// End reached
}

if ((results.IndexOf("\n") > -1
) || (results.IndexOf(",") > -1) || (Quote))
results
= "\"" + results + "\""
;

return
results
;
}

So, if this is so simple, what's the problem?

Well, other more general implementations of the CSV (and almost all standard delimited text file parsers) allow additional separators apart from the comma (for example, tabs or semicolons). And MS Excel tries to be "so smart" that if you open a CSV file and you have any separator except the comma, it will reformat the CSV, converting it to rows and columns using the comma as separator.

And the problem comes that Excel will reformat separating via commas even if the column data comes inside double quotes. And that messes up any non-comma separated CSV data with a "description" field that commonly has at least one comma inside...

So, if you're planning on giving your application CSV export capabilities, you have two options:

1) Always use comma as the separator, and respect the CSV standard. I recommend placing all fields inside double quotes. This way Excel will open and format the rows and columns correctly.

2) Use any other separator, and do not forget to tell your users not to open directly the CSV files, but instead a new, blank spreadsheet and then Import the CSV:

The Import Data option launches a wizard that allows to specify the delimiter character and respects the double quotes correctly.

Note: I have used MS Excel 2003 because it is the more widespread version currently in customers. I haven't tried direct opening with Excel 2007 of a non-comma separated CSV.

Comments?

Posted by Kartones on 2008-10-18