Discussion:
Database config for large table
Joel Garringer
2006-03-29 23:46:01 UTC
Permalink
We have a table with a few hundred thousand records with a couple of
dozen columns. Most of the columns are type FLOAT or INT.



We have been having a really tough time with performance against this
table. Queries against the table can take several minutes to run, and I
really think that we are just missing something.



The table can easily grow to around 100 meg, and I wanted to make sure
there weren't any easy changes we could make to the ".conf" file that
would help us. The database was very peppy when this table had less than
50,000 records, so the performance problems have kind of taken us by
surprise.



We run the database in embedded mode and we could give the database
hundreds of meg of memory if we needed to, but I not sure if/how Mckoi
can use the extra memory.



I think that I can send a copy of the table if you want to look at it.



Thanks.



Joel
Tobias Downer
2006-04-04 01:04:17 UTC
Permalink
Hi Joel,

Could you give an example of the type of queries that are performing
badly? Also, are you keeping at least one connection open to the
embedded database at all times? If your application is running Mckoi in
embedded mode and it creates only a single connection, performs a query,
and then closes the connection, you are incurring the cost of starting
and stopping the database together with the time it takes to perform the
query. In addition, each time you close the last connection to an
embedded database, the cache is wiped so you lose the benefit of
collecting cached data over a long session.

So I suggest you make sure to keep at least one connection open to the
embedded database over the entire session of your application. To
increase the size of the cache, modify the 'data_cache_size' property in
your .conf file. For example, to set the cache size to 32MB;

data_cache_size=33554432

Hope this information help,
Toby.
Post by Joel Garringer
We have a table with a few hundred thousand records with a couple of
dozen columns. Most of the columns are type FLOAT or INT.
We have been having a really tough time with performance against this
table. Queries against the table can take several minutes to run, and I
really think that we are just missing something.
The table can easily grow to around 100 meg, and I wanted to make sure
there weren’t any easy changes we could make to the “.conf” file that
would help us. The database was very peppy when this table had less than
50,000 records, so the performance problems have kind of taken us by
surprise.
We run the database in embedded mode and we could give the database
hundreds of meg of memory if we needed to, but I not sure if/how Mckoi
can use the extra memory.
I think that I can send a copy of the table if you want to look at it.
Thanks.
Joel
---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com
Joel Garringer
2006-04-04 23:33:39 UTC
Permalink
Based on previous messages in the mckoidb mailing list, we were making
sure to use one connection for the life of the application.

I know it is pretty sad that we hadn't tried it already, but the
increase in the data_cache_size does seem to have helped for many of the
queries. Is there any way to know the point of diminishing return for
giving it more memory? Are there any other settings we should also try
changing?

Some queries are still performing strangly. All of the test below were
performed using the Mckoi JDBC Query Tool.

In it's original form:

SELECT MIN(ed.SampleIndex)
FROM EventDetail ed, EventHeader eh
WHERE EventType=2
AND ed.SampleIndex>200000
AND eh.LocatesPriority
AND ed.SampleIndex=eh.SampleIndex

This query took around 2 second to execute against a particular
database.

We modified it to:

SELECT MIN(ed.SampleIndex)
FROM
(SELECT * FROM EventDetail
WHERE EventType=2) ed
JOIN EventHeader eh
ON ed.SampleIndex=eh.SampleIndex
WHERE ed.SampleIndex>200000
AND eh.LocatesPriority

In this form the query took ~0.02 seconds when it is run using version
1.0.2 of Mckoi. Against version 1.0.3 this form still takes ~2 seconds.
Removing the "AND eh.LocatesPriority" under 1.0.3 drops the time down to
~0.02 seconds.

We have discovered in situations where a filtering where condition (such
as the "EventType=2" in the example above) is going to cut down the
number of records significantly, applying it before the join conditions
can significantly up the performance. Does this sound like a sensible
technique?

Thanks for your help.

Joel


-----Original Message-----
From: ***@mckoi.com [mailto:***@mckoi.com] On Behalf Of Tobias
Downer
Sent: Monday, April 03, 2006 8:04 PM
To: ***@mckoi.com
Subject: Re: Database config for large table

Hi Joel,

Could you give an example of the type of queries that are performing
badly? Also, are you keeping at least one connection open to the
embedded database at all times? If your application is running Mckoi in

embedded mode and it creates only a single connection, performs a query,

and then closes the connection, you are incurring the cost of starting
and stopping the database together with the time it takes to perform the

query. In addition, each time you close the last connection to an
embedded database, the cache is wiped so you lose the benefit of
collecting cached data over a long session.

So I suggest you make sure to keep at least one connection open to the
embedded database over the entire session of your application. To
increase the size of the cache, modify the 'data_cache_size' property in

your .conf file. For example, to set the cache size to 32MB;

data_cache_size=33554432

Hope this information help,
Toby.
Post by Joel Garringer
We have a table with a few hundred thousand records with a couple of
dozen columns. Most of the columns are type FLOAT or INT.
We have been having a really tough time with performance against this
table. Queries against the table can take several minutes to run, and
I
Post by Joel Garringer
really think that we are just missing something.
The table can easily grow to around 100 meg, and I wanted to make sure
there weren't any easy changes we could make to the ".conf" file that
would help us. The database was very peppy when this table had less
than
Post by Joel Garringer
50,000 records, so the performance problems have kind of taken us by
surprise.
We run the database in embedded mode and we could give the database
hundreds of meg of memory if we needed to, but I not sure if/how Mckoi
can use the extra memory.
I think that I can send a copy of the table if you want to look at it.
Thanks.
Joel
---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com



---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com
Tobias Downer
2006-04-06 08:15:12 UTC
Permalink
Joel,

There isn't really a formula for finding the optimal cache size for an
application because queries result in such a variety of access patterns
that it's difficult to predict what size cache is best. Mckoi does
default to rather small cache sizes to ensure the heap usage is in line
with the typical default JVM max heap size. The other property I
suggest you add or change in your configuration is
'buffered_io_max_pages'. It defaults to 256 which is only 2 MB of
memory. Try setting this property to 2048 or greater.

Nesting queries to force the query planner to take a certain path to
resolve a query is a very sensible technique.

Toby.
Post by Joel Garringer
Based on previous messages in the mckoidb mailing list, we were making
sure to use one connection for the life of the application.
I know it is pretty sad that we hadn't tried it already, but the
increase in the data_cache_size does seem to have helped for many of the
queries. Is there any way to know the point of diminishing return for
giving it more memory? Are there any other settings we should also try
changing?
Some queries are still performing strangly. All of the test below were
performed using the Mckoi JDBC Query Tool.
SELECT MIN(ed.SampleIndex)
FROM EventDetail ed, EventHeader eh
WHERE EventType=2
AND ed.SampleIndex>200000
AND eh.LocatesPriority
AND ed.SampleIndex=eh.SampleIndex
This query took around 2 second to execute against a particular
database.
SELECT MIN(ed.SampleIndex)
FROM
(SELECT * FROM EventDetail
WHERE EventType=2) ed
JOIN EventHeader eh
ON ed.SampleIndex=eh.SampleIndex
WHERE ed.SampleIndex>200000
AND eh.LocatesPriority
In this form the query took ~0.02 seconds when it is run using version
1.0.2 of Mckoi. Against version 1.0.3 this form still takes ~2 seconds.
Removing the "AND eh.LocatesPriority" under 1.0.3 drops the time down to
~0.02 seconds.
We have discovered in situations where a filtering where condition (such
as the "EventType=2" in the example above) is going to cut down the
number of records significantly, applying it before the join conditions
can significantly up the performance. Does this sound like a sensible
technique?
Thanks for your help.
Joel
---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com
Joel Garringer
2006-04-06 17:48:44 UTC
Permalink
I will try upping the buffered_io_max_pages to 2048 and we will run with
your suggestion of 32 meg for the cache. That really seemed to help and
we can always increase it latter.

Forcing the query to work a certain way based on what we know about the
contents of the table does make sense, I just wanted to make sure that
the apparent performance increase wasn't just in our imagination or that
restructuring queries in that way wasn't something we would regret
later.

Thanks for your help, and thanks for developing Mckoi. We have been
using it for about two years now and it has helped us on _many_
projects.

Joel


-----Original Message-----
From: ***@mckoi.com [mailto:***@mckoi.com] On Behalf Of Tobias
Downer
Sent: Thursday, April 06, 2006 3:15 AM
To: ***@mckoi.com
Subject: Re: Database config for large table

Joel,

There isn't really a formula for finding the optimal cache size for an
application because queries result in such a variety of access patterns
that it's difficult to predict what size cache is best. Mckoi does
default to rather small cache sizes to ensure the heap usage is in line
with the typical default JVM max heap size. The other property I
suggest you add or change in your configuration is
'buffered_io_max_pages'. It defaults to 256 which is only 2 MB of
memory. Try setting this property to 2048 or greater.

Nesting queries to force the query planner to take a certain path to
resolve a query is a very sensible technique.

Toby.
Post by Joel Garringer
Based on previous messages in the mckoidb mailing list, we were making
sure to use one connection for the life of the application.
I know it is pretty sad that we hadn't tried it already, but the
increase in the data_cache_size does seem to have helped for many of
the
Post by Joel Garringer
queries. Is there any way to know the point of diminishing return for
giving it more memory? Are there any other settings we should also try
changing?
Some queries are still performing strangly. All of the test below were
performed using the Mckoi JDBC Query Tool.
SELECT MIN(ed.SampleIndex)
FROM EventDetail ed, EventHeader eh
WHERE EventType=2
AND ed.SampleIndex>200000
AND eh.LocatesPriority
AND ed.SampleIndex=eh.SampleIndex
This query took around 2 second to execute against a particular
database.
SELECT MIN(ed.SampleIndex)
FROM
(SELECT * FROM EventDetail
WHERE EventType=2) ed
JOIN EventHeader eh
ON ed.SampleIndex=eh.SampleIndex
WHERE ed.SampleIndex>200000
AND eh.LocatesPriority
In this form the query took ~0.02 seconds when it is run using version
1.0.2 of Mckoi. Against version 1.0.3 this form still takes ~2
seconds.
Post by Joel Garringer
Removing the "AND eh.LocatesPriority" under 1.0.3 drops the time down
to
Post by Joel Garringer
~0.02 seconds.
We have discovered in situations where a filtering where condition
(such
Post by Joel Garringer
as the "EventType=2" in the example above) is going to cut down the
number of records significantly, applying it before the join
conditions
Post by Joel Garringer
can significantly up the performance. Does this sound like a sensible
technique?
Thanks for your help.
Joel
---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com



---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com
Joel Garringer
2006-04-06 17:57:10 UTC
Permalink
Also, to anyone following this thread:

If you are having problems with the performance of a particular query
you should really take the time to do a little trial-and-error
optimization testing of different forms of the query.
From what we have observed, the database engine can't always make the
correct assumptions about the contents of a database, and there have
been several instances where just rearranging the conditions in a where
clause have made a query run 100 times faster.

Specifically, we have found that moving our most restrictive conditions
up to the front of the WHERE clause often helps.

Also, as I mentioned in this email thread, moving some conditions into
sub-selects in the FROM clause can also have a big effect.

Have fun.

-----Original Message-----
From: ***@mckoi.com [mailto:***@mckoi.com] On Behalf Of Joel
Garringer
Sent: Thursday, April 06, 2006 12:49 PM
To: ***@mckoi.com
Subject: RE: Database config for large table

I will try upping the buffered_io_max_pages to 2048 and we will run with
your suggestion of 32 meg for the cache. That really seemed to help and
we can always increase it latter.

Forcing the query to work a certain way based on what we know about the
contents of the table does make sense, I just wanted to make sure that
the apparent performance increase wasn't just in our imagination or that
restructuring queries in that way wasn't something we would regret
later.

Thanks for your help, and thanks for developing Mckoi. We have been
using it for about two years now and it has helped us on _many_
projects.

Joel


-----Original Message-----
From: ***@mckoi.com [mailto:***@mckoi.com] On Behalf Of Tobias
Downer
Sent: Thursday, April 06, 2006 3:15 AM
To: ***@mckoi.com
Subject: Re: Database config for large table

Joel,

There isn't really a formula for finding the optimal cache size for an
application because queries result in such a variety of access patterns
that it's difficult to predict what size cache is best. Mckoi does
default to rather small cache sizes to ensure the heap usage is in line
with the typical default JVM max heap size. The other property I
suggest you add or change in your configuration is
'buffered_io_max_pages'. It defaults to 256 which is only 2 MB of
memory. Try setting this property to 2048 or greater.

Nesting queries to force the query planner to take a certain path to
resolve a query is a very sensible technique.

Toby.
Based on previous messages in the mckoidb mailing list, we were making
sure to use one connection for the life of the application.
I know it is pretty sad that we hadn't tried it already, but the
increase in the data_cache_size does seem to have helped for many of
the
queries. Is there any way to know the point of diminishing return for
giving it more memory? Are there any other settings we should also try
changing?
Some queries are still performing strangly. All of the test below were
performed using the Mckoi JDBC Query Tool.
SELECT MIN(ed.SampleIndex)
FROM EventDetail ed, EventHeader eh
WHERE EventType=2
AND ed.SampleIndex>200000
AND eh.LocatesPriority
AND ed.SampleIndex=eh.SampleIndex
This query took around 2 second to execute against a particular
database.
SELECT MIN(ed.SampleIndex)
FROM
(SELECT * FROM EventDetail
WHERE EventType=2) ed
JOIN EventHeader eh
ON ed.SampleIndex=eh.SampleIndex
WHERE ed.SampleIndex>200000
AND eh.LocatesPriority
In this form the query took ~0.02 seconds when it is run using version
1.0.2 of Mckoi. Against version 1.0.3 this form still takes ~2
seconds.
Removing the "AND eh.LocatesPriority" under 1.0.3 drops the time down
to
~0.02 seconds.
We have discovered in situations where a filtering where condition
(such
as the "EventType=2" in the example above) is going to cut down the
number of records significantly, applying it before the join
conditions
can significantly up the performance. Does this sound like a sensible
technique?
Thanks for your help.
Joel
---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com



---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com



---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com
Bill Champ
2006-04-06 18:53:07 UTC
Permalink
From what we have observed, the database engine can't always make the
correct assumptions about the contents of a database, and there have
been several instances where just rearranging the conditions in a where
clause have made a query run 100 times faster.
Yeah, that's true for a lot of database engines. I suspect it's true for
all of them. I'm just reluctant to say "all" because it's so final. Like
you said, how does it know which part of a WHERE clause will yield that
smallest intermediate results? It really helps when joining tables as
the intermediate joined tables can get quite large then only to get
small again as it filters out non-qualifying records.

A Hibernate book I read has the same advice for queries that use the
built-in Hibernate query language - place the most restrictive elements
first and the least restrictive last.

--Bill




---------------------------------------------------------------
Mckoi SQL Database mailing list http://www.mckoi.com/database/
To unsubscribe, send a message to mckoidb-***@mckoi.com

Loading...