Split delimited string value into rows

Posted on

Split delimited string value into rows – This article will take you through the common SQL errors that you might encounter while working with mysql, sql,  database. The wrong arrangement of keywords will certainly cause an error, but wrongly arranged commands may also be an issue. SQL keyword errors occur when one of the words that the SQL query language reserves for its commands and clauses is misspelled. If the user wants to resolve all these reported errors, without finding the original one, what started as a simple typo, becomes a much bigger problem.

SQL Problem :

Some external data vendor wants to give me a data field – pipe delimited string value, which I find quite difficult to deal with.

Without help from an application programming language, is there a way to transform the string value into rows?

There is a difficulty however, the field has unknown number of delimited elements.

DB engine in question is MySQL.

For example:

Input: Tuple(1, "a|b|c")

Output:

Tuple(1, "a")
Tuple(1, "b")
Tuple(1, "c")

Solution :

It may not be as difficult as I initially thought.

This is a general approach:

  1. Count number of occurrences of the delimiter length(val) - length(replace(val, '|', ''))
  2. Loop a number of times, each time grab a new delimited value and insert the value to a second table.

Use this function by Federico Cargnelutti:

 CREATE FUNCTION SPLIT_STR(
 x VARCHAR(255),
 delim VARCHAR(12),
 pos INT
 )
   RETURNS VARCHAR(255)
   RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');

Usage

 SELECT SPLIT_STR(string, delimiter, position)

you will need a loop to solve your problem.

Although your issue is probably long time resolved, I was looking for a solution to the very same problem you had. I solved it with the help of a procedure referenced here with slight adaptions to serve multi-byte characters (such as the German Umlauts) in the string by using CHAR_LENGTH() instead of LENGTH().

DELIMITER $$
    CREATE FUNCTION SPLIT_STRING(val TEXT, delim VARCHAR(12), pos INT) RETURNS TEXT
    BEGIN
        DECLARE output TEXT;
        SET output = REPLACE(SUBSTRING(SUBSTRING_INDEX(val, delim, pos), CHAR_LENGTH(SUBSTRING_INDEX(val, delim, pos - 1)) + 1), delim, '');
        IF output = '' THEN
            SET output = null;
        END IF;
        RETURN output;
    END $$

    CREATE PROCEDURE TRANSFER_CELL()
    BEGIN
        DECLARE i INTEGER;
        SET i = 1;
        REPEAT
            INSERT INTO NewTuple (id, value)
            SELECT id, SPLIT_STRING(value, '|', i)
            FROM Tuple
            WHERE SPLIT_STRING(value, '|', i) IS NOT NULL;
            SET i = i + 1;
        UNTIL ROW_COUNT() = 0
        END REPEAT;
    END $$
DELIMITER ;

CALL TRANSFER_CELL() ;

DROP FUNCTION SPLIT_STRING ;
DROP PROCEDURE TRANSFER_CELL ;

Finding SQL syntax errors can be complicated, but there are some tips on how to make it a bit easier. Using the aforementioned Error List helps in a great way. It allows the user to check for errors while still writing the project, and avoid later searching through thousands lines of code.

Leave a Reply

Your email address will not be published. Required fields are marked *