Ask The Real Tom

July 30, 2010



MySQL solution: Speed up pattern searches starting with a wildchar (like ‘%abc%’)

Filed under: Uncategorized — Tags: , , , , , , , , — admin @ 10:04 am

Searches on MySQL with expression like ” … like ‘%abc%’” can be incredibly slow on large tables.
The problem is, MySQL can not use indices for expression starting with a wildcard “%”.
Neither MySQL supports function based indexes.

I will explain a simple solution. But first lets have a look to our problem.
Table size: 2G
Row count: 33’000
Query response time: ~ 25s

mysql> desc images;
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
| imageid | int(11) | NO | PRI | NULL | auto_increment |
| filename | varchar(200) | YES | MUL | NULL | |
| description | varchar(200) | YES | | NULL | |
| data | longblob | YES | | NULL | |
| type | enum('coversmall','cover','coverback','screenshot','specialcover','banner') | YES | MUL | NULL | |
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

The index can not be used


mysql> explain select imageid,filename,description,type from images where filename like '%call%' order by filename;
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | images | ALL | NULL | NULL | NULL | NULL | 371398 | Using where; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)

Index can be use if the expression does not start with a wild card


mysql> explain select imageid,filename,description,type from images where filename like 'call%' order by filename;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | images | range | idx_images_filename | idx_images_filename | 603 | NULL | 66 | Using where |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select imageid,filename,description,type from images where filename like 'call%abc' order by filename;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | images | range | idx_images_filename | idx_images_filename | 603 | NULL | 66 | Using where |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+

Since we are searching for the filename, we a new column for the filename. Add a character in front of the string and index the new column.


mysql> alter table images add vfilename varchar(201);

Query OK, 33064 rows affected (3 min 4.73 sec)
Records: 33064 Duplicates: 0 Warnings: 0

update images set vfilename=concat('a',filename);

Query OK, 33064 rows affected (52.40 sec)
Rows matched: 33064 Changed: 33064 Warnings: 0

create index idx_images_vfilename on images (vfilename);

Now we change our application so it add always the same character “a” in the pattern.
So we can use the index.


mysql> explain select imageid,filename,description,type from images where vfilename like 'a%abc%' order by filename;
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
| 1 | SIMPLE | images | range | idx_images_vfilename | idx_images_vfilename | 603 | NULL | 75840 | Using where; Using filesort |
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
1 row in set (0.00 sec)

The perfomance benefit is, that we can do now a index scan (full/range scan) which read much less data than a full table scan.
We reduced i/o, saved our cache and speed up the query by magnitudes!

This is a real ask the real tom solution!

2 Comments »

  1. I don’t understand, will this really help with the problem? Since every single row will have the “a” in the condition the a%abc% will still look at every single leaf of the tree to see what is in each row would it not? The reason why abc% can use the index is because it can use the index to find all of the values that begin with abc% so if I were to search for abc%dfg% I would still only find all the rows for abc% using the index, it would then do a manual check of each entry to figure out in that set which ones have dfg in the value. So while I could see this being useful in cases where the row size is large but the field size is small, I would imagine you could easily make yourself slower using something like above than faster. For example, if my table were something like:

    Table A:
    Column1 Char(1)
    Column2 Number
    ColumnDesc varchar(3000)

    In this case I would only be saving a few bytes per row in my index, but I would add a cost of traversing an entire binary tree just to find each value of ColumnDesc would I not?

    Comment by James — April 19, 2011 @ 7:53 pm

  2. In your example you are right.
    But you see my example has large images on the lob column.
    Actually you change a full table scan against a full index scan. If your table is much bigger than your index you save a lot of IO’s.

    Comment by admin — August 3, 2011 @ 8:15 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress