MySQL solution: Speed up pattern searches starting with a wildchar (like ‘%abc%’)
Searches on MySQL with expression like ” … like ‘%abc%’” can be incredibly slow on large tables.
The problem is, MySQL can not use indices for expression starting with a wildcard “%”.
Neither MySQL supports function based indexes.
I will explain a simple solution. But first lets have a look to our problem.
Table size: 2G
Row count: 33′000
Query response time: ~ 25s
mysql> desc images;
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
| imageid | int(11) | NO | PRI | NULL | auto_increment |
| filename | varchar(200) | YES | MUL | NULL | |
| description | varchar(200) | YES | | NULL | |
| data | longblob | YES | | NULL | |
| type | enum('coversmall','cover','coverback','screenshot','specialcover','banner') | YES | MUL | NULL | |
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
The index can not be used
mysql> explain select imageid,filename,description,type from images where filename like '%call%' order by filename;
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | images | ALL | NULL | NULL | NULL | NULL | 371398 | Using where; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)
Index can be use if the expression does not start with a wild card
mysql> explain select imageid,filename,description,type from images where filename like 'call%' order by filename;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | images | range | idx_images_filename | idx_images_filename | 603 | NULL | 66 | Using where |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain select imageid,filename,description,type from images where filename like 'call%abc' order by filename;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | images | range | idx_images_filename | idx_images_filename | 603 | NULL | 66 | Using where |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
Since we are searching for the filename, we a new column for the filename. Add a character in front of the string and index the new column.
mysql> alter table images add vfilename varchar(201);
Query OK, 33064 rows affected (3 min 4.73 sec)
Records: 33064 Duplicates: 0 Warnings: 0
update images set vfilename=concat('a',filename);
Query OK, 33064 rows affected (52.40 sec)
Rows matched: 33064 Changed: 33064 Warnings: 0
create index idx_images_vfilename on images (vfilename);
Now we change our application so it add always the same character “a” in the pattern.
So we can use the index.
mysql> explain select imageid,filename,description,type from images where vfilename like 'a%abc%' order by filename;
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
| 1 | SIMPLE | images | range | idx_images_vfilename | idx_images_vfilename | 603 | NULL | 75840 | Using where; Using filesort |
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
1 row in set (0.00 sec)
The perfomance benefit is, that we can do now a index scan (full/range scan) which read much less data than a full table scan.
We reduced i/o, saved our cache and speed up the query by magnitudes!
This is a real ask the real tom solution!
I don’t understand, will this really help with the problem? Since every single row will have the “a” in the condition the a%abc% will still look at every single leaf of the tree to see what is in each row would it not? The reason why abc% can use the index is because it can use the index to find all of the values that begin with abc% so if I were to search for abc%dfg% I would still only find all the rows for abc% using the index, it would then do a manual check of each entry to figure out in that set which ones have dfg in the value. So while I could see this being useful in cases where the row size is large but the field size is small, I would imagine you could easily make yourself slower using something like above than faster. For example, if my table were something like:
Table A:
Column1 Char(1)
Column2 Number
ColumnDesc varchar(3000)
In this case I would only be saving a few bytes per row in my index, but I would add a cost of traversing an entire binary tree just to find each value of ColumnDesc would I not?
Comment by James — April 19, 2011 @ 7:53 pm
In your example you are right.
But you see my example has large images on the lob column.
Actually you change a full table scan against a full index scan. If your table is much bigger than your index you save a lot of IO’s.
Comment by admin — August 3, 2011 @ 8:15 am