Replacing Or Removing A New Line With Something Else But Only Between Single Or Double Quotes Using PHP On A CSV File
I have a CSV file that holds about 200,000 - 300,000 records. Most of the records can be separated and inserted into a MySQL database with a simple
$line = explode("\n", $fileData);
and then the values separated with
$lineValues = explode(',', $line);
and then inserted into the database using the proper data type i.e int, float, string, text, etc.
However, some of the records have a text column that includes a \n in the string. Which breaks when using the $line = explode("\n", $fileData); method. Each line of data that needs to be inserted into the database has approximately 216 columns. not every line has a record with a \n in the string. However, each time a \n is found in the line it is enclosed between a pair of single quotes (')
each line is set up in the following format:
id,data,data,data,text,more data
example:
1,0,0,0,'Hello World,0
2,0,0,0,'Hello
World',0
3,0,0,0,'Hi',0
4,0,0,0,,0
As you can see from the example, most records can be easily split with the methods shown above. Its the second record in the example that causes the problem.
New lines are only \n and the file does not include \r in the file at all.
Answer
If the csv data is in a file, you can just use fgetcsv() as others have pointed out. fgetcsv handles embedded newlines correctly.
However if your csv data is in a string (like $fileData in your example) the following method may be useful as str_getcsv() only works on a row at a time and cannot split a whole file into records.
You can detect the embedded newlines by counting the quotes in each line. If there are an odd number of quotes, you have an incomplete line, so concatenate this line with the following line. Once you have an even number of quotes, you have a complete record.
Once you have a complete record, split it at the quotes (again using explode()). Odd-numbered fields are quoted (thus embedded commas are not special), even-numbered fields are not.
Example:
# Split file into physical lines (records may span lines)
$lines = explode("\n", $fileData);
# Re-assemble records
$records = array ();
$record = '';
$lineSep = '';
foreach ($lines as $line) {
# Escape @ symbol so we can use it as a marker (as it does not conflict with
# any special CSV character.)
$line = str_replace('@', '@a', $line);
# Escape commas as we don't yet know which ones are separators
$line = str_replace(',', '@c', $line);
# Escape quotes in a form that uses no special characters
$line = str_replace("\\'", '@q', $line);
$line = str_replace('\\', '@b', $line);
$record .= $lineSep . $line;
$lineSep = "\n";
# Must have an even number of quotes in a complete record!
if (substr_count($record, "'") % 2 == 0) {
$records[] = $record;
$record = '';
$lineSep = '';
}
}
if (strlen($record) > 0) {
$records[] = $record;
}
$rows = array ();
foreach ($records as $record) {
$chunks_in = explode("'", $record);
$chunks_out = array ();
# Decode escaped quotes/backslashes.
# Decode field-separating commas (unless quoted)
foreach ($chunks_in as $i => $chunk) {
# Unescape quotes & backslashes
$chunk = str_replace('@q', "'", $chunk);
$chunk = str_replace('@b', '\\', $chunk);
if ($i % 2 == 0) {
# Unescape commas
$chunk = str_replace('@c', ',', $chunk);
}
$chunks_out[] = $chunk;
}
# Join back together, discarding unescaped quotes
$record = join('', $chunks_out);
$chunks_in = explode(',', $record);
$row = array ();
foreach ($chunks_in as $chunk) {
$chunk = str_replace('@c', ',', $chunk);
$chunk = str_replace('@a', '@', $chunk);
$row[] = $chunk;
}
$rows[] = $row;
}
Related Questions
- → "failed to open stream" error when executing "migrate:make"
- → October CMS Plugin Routes.php not registering
- → OctoberCMS Migrate Table
- → OctoberCMS Rain User plugin not working or redirecting
- → October CMS Custom Mail Layout
- → October CMS - How to correctly route
- → October CMS create a multi select Form field
- → October CMS - Conditionally Load a Different Page
- → How to disable assets combining on development in OctoberCMS
- → October CMS - Radio Button Ajax Click Twice in a Row Causes Content to disappear
- → OctoberCms component: How to display all ID(items) instead of sorting only one ID?
- → In OctoberCMS how do you find the hint path?
- → How to register middlewares in OctoberCMS plugin?