Numeric sorting an alphanumeric column

The other day a customer came up to me and said: I have this column that holds numeric data, usually. But when I sort it it gets all messed up, because is sorts it alphanumerically. That is 10 is listed before 2 etc.

My first suggestion was: well, sort by TO_NUMBER(column) then.

Well, he replied, that can’t be done. Sometimes the column contains alphanumeric data.

Oh, and I don’t want to use PL/SQL since I once learned that switching between the SQL engine and the PL/SQL engine costs a lot of performance.

Well, let’s start by making a simple statement that shows the problem:

WITH t AS
  (          SELECT '1' numval FROM dual
   UNION ALL SELECT to_char(9) FROM dual -- 9
   UNION ALL SELECT '#' FROM dual
   UNION ALL SELECT to_char(10) FROM dual -- 10
   UNION ALL SELECT 'G' FROM dual
   UNION ALL SELECT 'Green' FROM dual
   UNION ALL SELECT 'Yel2low' FROM dual
   UNION ALL SELECT 'Pink' FROM dual
   UNION ALL SELECT 'B' FROM dual
   UNION ALL SELECT '2' FROM dual
   UNION ALL SELECT '2912B' FROM dual
   UNION ALL SELECT 'B2912' FROM dual
  )
SELECT t.*
  FROM t
 ORDER BY t.numval

The output of this query, is as expected, sorted, but alphanumerically:

NUMVAL
-------
#
B
B2912
G
Green
Pink
Yel2low
1
10
2
2912B
9

12 rows selected.

Then I came to think: I want the ordering done numerically, but I cannot use to_number on the column, because there can be alphanumeric values and that will break the query. But what if I remove everything but the numbers for the column, when sorting. That is where Regular Expressions come in. I am definitely not an expert in the field of Regular Expressions, but I thought I would give it a try. I replace all the letters [:alpha:], punctuality elements [:punct:] and blanks [:blank:] with nothing.

WITH t AS
  (          SELECT '1' numval FROM dual
   UNION ALL SELECT to_char(9) FROM dual -- 9
   UNION ALL SELECT '#' FROM dual
   UNION ALL SELECT to_char(10) FROM dual -- 10
   UNION ALL SELECT 'G' FROM dual
   UNION ALL SELECT 'Green' FROM dual
   UNION ALL SELECT 'Yel2low' FROM dual
   UNION ALL SELECT 'Pink' FROM dual
   UNION ALL SELECT 'B' FROM dual
   UNION ALL SELECT '2' FROM dual
   UNION ALL SELECT '2912B' FROM dual
   UNION ALL SELECT 'B2912' FROM dual
  )
SELECT t.*
  FROM t
 ORDER BY to_number(regexp_replace(numval, '([[:alpha:]]|[[:punct:]]|[[:blank:]])')) NULLS LAST

The output is almost what I want:

NUMVAL
-------
1
Yel2low
2
9
10
B2912
2912B
#
B
G
Green
Pink

12 rows selected.

While reading about the character classes I came across the [:digit:] class and I knew there was a NOT operator. And that is actually what I want. Replace everything that is not a digit with nothing.

WITH t AS
  (          SELECT '1' numval FROM dual
   UNION ALL SELECT to_char(9) FROM dual -- 9
   UNION ALL SELECT '#' FROM dual
   UNION ALL SELECT to_char(10) FROM dual -- 10
   UNION ALL SELECT 'G' FROM dual
   UNION ALL SELECT 'Green' FROM dual
   UNION ALL SELECT 'Yel2low' FROM dual
   UNION ALL SELECT 'Pink' FROM dual
   UNION ALL SELECT 'B' FROM dual
   UNION ALL SELECT '2' FROM dual
   UNION ALL SELECT '2912B' FROM dual
   UNION ALL SELECT 'B2912' FROM dual
  )
SELECT t.*
  FROM t
 ORDER BY to_number(regexp_replace(numval, '([^[:digit:]])')) NULLS LAST

Same output, less typing (which is always good)

NUMVAL
-------
1
Yel2low
2
9
10
B2912
2912B
#
B
G
Green
Pink

12 rows selected.

But the combined values are not sorted properly yet. If there are other characters than digits in the column, then they should just be sorted alphabetically. The current query just remove all the non digits before sorting. What I want to happen is when I can remove all the non digits from the column and the length is still the same, then sort numerically, otherwise give it a NULL value putting them at the end of the resultset. My second ordering clause is just the column, resulting in all the non-numeric values being sorted ‘normal’

WITH t AS
  (          SELECT '1' numval FROM dual
   UNION ALL SELECT to_char(9) FROM dual -- 9
   UNION ALL SELECT '#' FROM dual
   UNION ALL SELECT to_char(10) FROM dual -- 10
   UNION ALL SELECT 'G' FROM dual
   UNION ALL SELECT 'Green' FROM dual
   UNION ALL SELECT 'Yel LOW' FROM dual
   UNION ALL SELECT 'Pink' FROM dual
   UNION ALL SELECT 'B' FROM dual
   UNION ALL SELECT '2' FROM dual
   UNION ALL SELECT '2912B' FROM dual
   UNION ALL SELECT 'B2912' FROM dual
  )
SELECT t.*
  FROM t
 ORDER BY CASE
            WHEN (length(numval) = length(regexp_replace(numval, '([^[:digit:]])'))) THEN
             to_number(regexp_replace(numval, '([^[:digit:]])'))
            ELSE
             NULL
          END NULLS LAST
         ,numval

Resulting in this output:

NUMVAL
-------
1
2
9
10
#
B
B2912
G
Green
Pink
Yel2low
2912B

12 rows selected.

This is a lot more like I think the customer wants his column to be sorted. Maybe not exactly what he want it, but it’s a good start, I think.

One thought on “Numeric sorting an alphanumeric column


Leave a Reply

Your email address will not be published. Required fields are marked *