Ask The Real Tom

July 30, 2010



MySQL solution: Speed up pattern searches starting with a wildchar (like ‘%abc%’)

Filed under: Uncategorized — Tags: , , , , , , , , — admin @ 10:04 am

Searches on MySQL with expression like ” … like ‘%abc%'” can be incredibly slow on large tables.
The problem is, MySQL can not use indices for expression starting with a wildcard “%”.
Neither MySQL supports function based indexes.

I will explain a simple solution. But first lets have a look to our problem.
Table size: 2G
Row count: 33’000
Query response time: ~ 25s

mysql> desc images;
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
| imageid | int(11) | NO | PRI | NULL | auto_increment |
| filename | varchar(200) | YES | MUL | NULL | |
| description | varchar(200) | YES | | NULL | |
| data | longblob | YES | | NULL | |
| type | enum('coversmall','cover','coverback','screenshot','specialcover','banner') | YES | MUL | NULL | |
+-------------+-----------------------------------------------------------------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

The index can not be used


mysql> explain select imageid,filename,description,type from images where filename like '%call%' order by filename;
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | images | ALL | NULL | NULL | NULL | NULL | 371398 | Using where; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)

Index can be use if the expression does not start with a wild card


mysql> explain select imageid,filename,description,type from images where filename like 'call%' order by filename;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | images | range | idx_images_filename | idx_images_filename | 603 | NULL | 66 | Using where |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select imageid,filename,description,type from images where filename like 'call%abc' order by filename;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | images | range | idx_images_filename | idx_images_filename | 603 | NULL | 66 | Using where |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-------------+

Since we are searching for the filename, we a new column for the filename. Add a character in front of the string and index the new column.


mysql> alter table images add vfilename varchar(201);

Query OK, 33064 rows affected (3 min 4.73 sec)
Records: 33064 Duplicates: 0 Warnings: 0

update images set vfilename=concat('a',filename);

Query OK, 33064 rows affected (52.40 sec)
Rows matched: 33064 Changed: 33064 Warnings: 0

create index idx_images_vfilename on images (vfilename);

Now we change our application so it add always the same character “a” in the pattern.
So we can use the index.


mysql> explain select imageid,filename,description,type from images where vfilename like 'a%abc%' order by filename;
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
| 1 | SIMPLE | images | range | idx_images_vfilename | idx_images_vfilename | 603 | NULL | 75840 | Using where; Using filesort |
+----+-------------+--------+-------+----------------------+----------------------+---------+------+-------+-----------------------------+
1 row in set (0.00 sec)

The perfomance benefit is, that we can do now a index scan (full/range scan) which read much less data than a full table scan.
We reduced i/o, saved our cache and speed up the query by magnitudes!

This is a real ask the real tom solution!

July 8, 2010



Oracle Web Application Trace – Tuning Approach

Filed under: Uncategorized — Tags: , , , , , , — admin @ 10:07 am

Situation:

The user claims that a oracle web application is slow. The web application is unsing in database java.

Test 1: Determine the slow Part

The webapplication can be run in sqlplus with following pl/sql

set serveroutput on
set timing on
declare
name owa.vc_arr;
val owa.vc_arr;
res owa.vc_arr;
simple_list__ owa_util.vc_arr;
complex_list__ owa_util.vc_arr;
c clob;
begin
name(1) := '1';
val(1) := '1';
OWA.init_cgi_env (0, name, val);
HTP.init;
-- if ((owa_match.match_pattern('jdb', simple_list__, complex_list__, true))) then
jdb(1, name, val, res);
-- end if;
wpg_docload.get_download_file(:doc_info);
OWA_UTIL.showpage;
end;
/

The result showed that the application is using 4 frames. This means that 1 web request is followed by 4 subsequend requests.
We see as well that the owa functions are not the problem.

Test 2: Trace the application
To just trace the relevant part, I traced by server and modul name


select username,client_identifier,process,module from v$session where module like 'httpd%';

exec DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE('SYS$USERS','httpd@svzpca2 (TNS V1-V3)', DBMS_MONITOR.ALL_ACTIONS,TRUE,TRUE,NULL);

exec DBMS_MONITOR.SERV_MOD_ACT_TRACE_DISABLE ('SYS$USERS','httpd@svzpca2 (TNS V1-V3)', DBMS_MONITOR.ALL_ACTIONS,NULL);

cd udump
tkprof

The trace showed one query taken about 10 to 20 seconds per executionn

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 15 0.00 0.00 0 0 0 0
Execute 15 155.99 152.95 1 4147 8870 0
Fetch 24 0.35 0.38 1 3632 0 134
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 54 156.34 153.34 2 7779 8870 134

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 192 (recursive depth: 1)

Now its a simple sql tuning task to solve the problem

May 4, 2008



Tune a oracle database for performance (II)

Filed under: oracle — Tags: , , , , , — admin @ 11:26 pm

It’s about time to give you an update on my tuning challange.

Few statments:

  • bad perfomance is better than unstable or changing performance
  • if you gather system stats to a stats table the intervall option does not work (I did it with cronjob to start and stop the gathering)

My old system stats looked like this:

col PVAL2 for a50
select * from sys.aux_stats$;
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- -------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    10-13-2006 05:41
SYSSTATS_INFO                  DSTOP                                     10-13-2006 05:41
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  SREADTIM                              1.8
SYSSTATS_MAIN                  MREADTIM                              1.3
SYSSTATS_MAIN                  CPUSPEED                              590
SYSSTATS_MAIN                  MBRC                                    8
SYSSTATS_MAIN                  MAXTHR                          377000000
SYSSTATS_MAIN                  SLAVETHR                           131000

The new ones gathered from production and ajusted a bit like this.

exec DBMS_STATS.SET_SYSTEM_STATS('mbrc',12);
exec DBMS_STATS.SET_SYSTEM_STATS('sreadtim',1.8);
exec DBMS_STATS.SET_SYSTEM_STATS('mreadtim',2.2);
exec DBMS_STATS.SET_SYSTEM_STATS('cpuspeednw',500);
exec DBMS_STATS.SET_SYSTEM_STATS('ioseektim',12);
exec DBMS_STATS.SET_SYSTEM_STATS('iotfrspeed',19000);

The gathered mbrc was just 2. But that value was gathered over 7 hours an may not represent the peak load. I ajusted it to 12 just because I feel good with 12. (42 might be the best value, but lets stick with 12)

The mreadtim was lower as the sreadtim because of our multi-oi-optimized SAN. But the workload statistiks don’t get used if mreatime < sreadtime. I read somewhere (don’t rember where) that sreadtim*1.2 might be a reasonable value.

The CPU speed is set lower now, because the CPU workload on the system increased much.

If my new system stats do good… We will see after the first integration test.

The second tuning step will be sql profiles. It’s too dangerous to implement them on production. So I decided to do remote tuning.

  • Fist get a SQL Tuning Set (STS) on production
  • Load it to a staging table and export it
  • Import the staging table on the integration environment and unpack the STS
  • Let dbms_sqltune do some analysis on it
  • Implement the sql profiles an test their effect
  • If they are good (one is for sure 16s -> 0.01s) load them to a staging table
  • Import the sql profiles on the Production and unpack them

Some sql used to perform this step:

#create the sts staging table
BEGIN
  DBMS_SQLTUNE.CREATE_STGTAB_SQLSET( table_name => 'staging_table' );
END;
/ 

# put my sts to the staging table
BEGIN
  DBMS_SQLTUNE.PACK_STGTAB_SQLSET(
      sqlset_name  => 'my_sts',
      staging_table_name => 'staging_table');
END;
/ 

# create sql profile staging table
exec DBMS_SQLTUNE.CREATE_STGTAB_SQLPROF ( table_name => 'SQLPROF_STAGING', schema_name => 'SYSTEM' );

# move all my sql profiles to the staging table
begin
DBMS_SQLTUNE.PACK_STGTAB_SQLPROF (
   profile_name          => '%',
   profile_category      => 'DEFAULT',
   staging_table_name    => 'SQLPROF_STAGING',
   staging_schema_owner  => 'SYSTEM');
end;
/ 

# unpack all (%) my sql profiles on production
begin
DBMS_SQLTUNE.UNPACK_STGTAB_SQLPROF (
   profile_name          => '%',
   profile_category      => 'DEFAULT',
   replace               => true,
   staging_table_name    => 'SQLPROF_STAGING',
   staging_schema_owner  => 'SYSTEM');
end;
/

Please post a comment if you think there is a better way.

April 28, 2008



Tune a oracle database for performance

Filed under: oracle — Tags: , , , — admin @ 8:34 pm

Let me explain the situation first.

  • Only one Sun Solaris 8 Server
  • 12 CPU’s
  • 48 G RAM
  • 3 Databases (1 Tera, 150 G, 10 G)
  • 3 Applications (2 Java Jboss, 1 Forte)
  • CPU load at batch processing at night constantly about 60
  • SAN I/O throughput about 400 MB/s possible
  • 9i system statistics (not suitable for the current workload)

In other words an old overloaded system with slow cpu’s. Since buying new hardware and separate the databases from the applications is a bureacratic mission which takes a lot of time (about 1 year) and the time for the nightly batch processing get more tight, I am in the challenging situation to get the maximum out of this box.

So what I am going to try is to tune the system so it’s not so cpu bound any more. There is a lot of unused resources on the san. Some long running query are very subtle to changes and could make the application to take longer -> that mean the batch to fail!
One important thing to know is that many statement are hinted.

Possibility One

Change the system so it uses more I/O than CPU. But wait, more Logical I/O mean more CPU cycles are used. To continue this strategy will be necessary to check how the cpu time gets spent.
I will follow up on this idea on a later blog …. maybe

Possibility Two

Reduce general resource usage. This might be not that easy since many specialist and consultants already tuned the system quite well.
As I said many statements are hinted already and they perform ok I think with better statistics it would be possible to get more out of the unhinted statements.

So let put together single thoughts:

  • Many statements perform better with the NO_CPU_COSTING hint.
  • The newly gathered system statistics (not activated) on production show a much slower CPU (duing the increased workload)
  • “_optimizer_cost_model”=choose could be changed to io (because of the fact with the no_cpu_costing hint)
  • mreadtim < sreadtim, means that the current workload statistics are not used (mreadtim may me useless gathered on systems with san since it’s too small)
  • OPTIMIZER_INDEX_CACHING is set to 100
  • OPTIMIZER_INDEX_COST_ADJ is set to 10
  • db_file_multiblock_read_count is set to 16 (but the is a MBRC value in the system statistics what make it not used I think)
  • nestet loop joins are less cpu intensive than hash joins -> but more logical I/O may make this assumption invalid
  • cost = (#srds * sreadtm + #mrds * mreadtm + #cpucycles / cpuspeed) / sreadtm (not sure if this is true… but anyway it just may if workload statistics are used)

So I decided to update the system statistics with values from production db and tune them a bit. Maybe tomorrow I will give an update the new values.

Powered by WordPress