Mysql Improve Search Performance with wildcards (%%) – This article will take you through the common SQL errors that you might encounter while working with mysql, sql, indexing. The wrong arrangement of keywords will certainly cause an error, but wrongly arranged commands may also be an issue. SQL keyword errors occur when one of the words that the SQL query language reserves for its commands and clauses is misspelled. If the user wants to resolve all these reported errors, without finding the original one, what started as a simple typo, becomes a much bigger problem.
SQL Problem :
Below is a query I use for searching a person by email
SELECT * FROM phppos_customers JOIN phppos_people ON phppos_customers.person_id = phppos_people.person_id WHERE deleted = 0 AND email LIKE '%f%' ORDER BY email ASC
Will adding an index on “email” speed up the query?
No, because MySQL will not be able to utilize the index when you have a leading wildcard. If you changed your LIKE to ‘f%’, then it would be able to use the index.
Mysql will not use the index because
LIKE argument (
%f%) starts with the wildcard character
If it starts with a constant, index will be used.
More info: 7.5.3. How MySQL Uses Indexes
Wildcarding the left side of a
LIKE operation ensures that an index, if one exists on the
Full Text Search (FTS) is preferred syntax for finding strings within text via SQL. MySQL has native FTS functionality, using the MATCH/AGAINST syntax (Requires the table to use the MyISAM engine for v.5.5 and below. InnoDB FTS supported on v.5.6+):
SELECT c.*, p.* FROM PHPPOS_CUSTOMERS c JOIN PHPPOS_PEOPLE p ON p.person_id = c..person_id WHERE deleted = 0 AND MATCH(email) AGAINST('f') ORDER BY email
But there are third party FTS technology, such as Sphinx.
In my post here I describe, in detail, a technique that allows you to use the index with
LIKE for fast
%infix% search, at the cost of some extra storage:
As long as the strings are relatively small, the storage requirement is generally acceptable.
According to Google, the average e-mail address is
25 characters long. This increases your required storage by a factor
12.5 on average, and gives you fast indexed search in return. (See my post for the calculations.)
From my perspective, if you are storing 10’000 e-mail addresses, you should be fine storing (the equivalent of) about 100’000 e-mail addresses, too. If this is what it takes to allow you to use an index, that seems like an acceptable trade-off. Often, disk space is cheap, while non-indexed searches are unaffordable.
If you choose to take this approach, I suggest that you limit the input length of e-mail addresses to
64 characters. Those rare (or attacker) e-mail addresses of such length will require up to
32 times the usual storage. This gives you:
- Protection against an attacker trying to flood your database, since these still aren’t very impressive amounts of data.
- The expectation that most e-mail addresses are not of this length anyway.
If you consider
64 characters too harsh a requirement, use
255 instead, for a worst-case storage increase factor of
127.5. Ridiculous? Possibly. Likely? No. Fast? Very.
You will not be able to make it faster with
LIKE just like everyone says (about the
% at the beginning), but you can improve it a little by joining after you filter your people first.
SELECT * FROM (SELECT * FROM `phppos_customers` WHERE `deleted` = 0 AND `email` LIKE '%f%') `t_customers` JOIN `phppos_people` ON `t_customers`.`person_id`=`phppos_people`.`person_id` ORDER BY `email` asc
I know how to outsmart mysql and enable the index search even when wild card searching from the left side. Just create a reversed column of your column (make it an index), reverse also the search string, and use the wildcard from right side which has the index support..
So if you have in db word “slibro” and you want search “%libro”, created reversed column will contain “orbils” and search will be “orbil%”.
PS: But don’t have the solution how to do fast full wildcard search “%x%” though :).
Finding SQL syntax errors can be complicated, but there are some tips on how to make it a bit easier. Using the aforementioned Error List helps in a great way. It allows the user to check for errors while still writing the project, and avoid later searching through thousands lines of code.