Real World Vertica Performance at Full360 via Chef

As a boutique, BI solutions integrator, Full360 sometimes receives requests from medium sized, prospective clients to do a proof of concept of our elasticBI platform, a stack consisting of Jaspersoft BI reporting combined with the Vertica columnar database.  Typically in these cases, the prospect is comparing several BI solutions, looking for the one that will best fit their needs.  The bigger players in the space often have more resources to throw at building a POC… usually part of a presales technical team. 

Full360 doesn’t have the luxury of being able to throw several days of developer time at a POC.  Our developers need to be working on billable project and support activities (the highest priority) or building out our platform, the second highest priority.  Of course, there are situations where it seems right for us to do a POC, and this is when Opscode Chef gives us an edge. 

In a recent POC engagement, I was able to get a Vertica server up and running within 1 hour of receiving secure access to the prospect’s Amazon Web Services account.  The time broken out: the first half hour, administrative task such as generating AWS keys, creating a Chef artifacts, and other minor prep tasks. The second half hour was the actual Vertica server creation.  A large chunk of this time gets eaten up by formatting a 480GB ephemeral disk volume used to store temp files during Vertica query processing.

Once the server was up and running, I pulled the prospect’s data files and table definitions from an Amazon S3 storage bucket directly to the Vertica server, then proceeded to build out the database schema.  The original data was sourced from Oracle, so I had to convert some of the data types to better match Vertica.  Once the tables were created, I was able to use the Vertica bulk loader to pump in the data directly from the files.  Loading over a half billion rows took about 15 minutes.  The raw data files were compressed to 11GB on disk, representing 30GB of uncompressed data.  Once loaded into Vertica, the storage footprint for this data was 3GB, a 10 to 1 compression ratio.

After loading the data, I ran the Vertica database designer to deploy default projections.  Vertica projections are the column files stored on disk.  Vertica will analyze the cardinality and data type of your data in order to pick the best ordering and compression scheme that provide the super fast querying that Vertica is known for. 

The whole process of building the tables, loading the data, and deploying the projections took less than 2 hours.  Add in the hour for setup and we have a total of 3 hours invested in the POC.  At this point I was able to pass the credentials over to the prospect, and allow them to begin their performance testing.

This went a lot faster than even I expected, so I decided to spend a little bit of time running my own performance tests on the database.  I used JMeter, an open source tool that allows for parallel querying of any database with a proper JDBC driver.  It took me about an hour to get my test set up.

As a baseline, the prospect provided two test queries, both of which took approximately 2 minutes to run in Oracle.  Both of these queries took around 10 seconds in Vertica… wowing the clients, needless to say!

Having a single database running a single query super fast is nice, but in the real world you will have multiple users running concurrent queries, which is where JMeter came in.

JMeter allowed me to run the queries as many times as I want, for as many parallel users as I want.  I started did tests with 1, 2, 8, 12, 16, and 32 users.  The test ran both of the queries, one after another, for 10 to 3 repetitions, depending on how many users I was simulating.  JMeter provides a breakdown of the query times, including the minimum, maximum, and average.

Running on a big AWS server (m2.4xlarge) I saw the following query times:


Using the 2 minute Oracle query as the definition of “bad” performance, I grouped the performance for each user load into colored categories.  The green row represents what I consider to be acceptable concurrency, based upon the max query time of 50.4 seconds.  This isn’t as impressive as 10 seconds, but it is still significantly better than Oracle, which most certainly would not provide 2 minute queries with 8 concurrent users (the prospect had tested this already).  The yellow bands represent both peak activity, as well as a warning that it’s time to start adding some more server resources in the way of additional nodes to the Vertica cluster.  The red band represents unacceptable performance, as the max has exceeded the 2 minute baseline.

Update - On further review we found that it was essentially a single query that was impacting the performance, and we will be able to use Vertica’s resource pooling features to make sure that queries of this nature are restricted to a particular node in the cluster, reducing the impact on other queries.  

From a Full360 perspective, we were able to use Chef to get a server quickly built to spec, then spend our time doing effective work for the POC, resulting in reality based metrics that to guide our prospect.   Chef was actually used several times in the process as I stopped and started the server several times in order to run Vertica on a range of AWS instance sizes.  Each time I started up, I would just run the chef-client utility, then everything would be configured, verified, with Vertica up and running at the end of the process.


comments powered by Disqus