From Our Blog

Syndicate content
You can focus on analyzing your business, because we manage the BI development, operations and infrastructure
Updated: 22 min 19 sec ago

The Agile Data Warehouse™ Story

Tue, 01/12/2010 - 09:05
Next-Generation Business Intelligence?There are numerous articles and discussions on the web about next- generation BI. Industry analysts talk about the need for MDM and data governance standards and strategies to form the basis of the next generation of BI. We believe that this focus is mostly in response to the disorganization of systems that have been built with traditional BI and data warehouse tools.

We've been watching the BI market develop and mature over the past fifteen years, and think there is a new approach that incorporates these components at the appropriate time, but gets modern systems in place at a speed and cost that just wasn't possible two or three years ago.

Because the biggest issue in BI these days is that most users still don't have all the information they need - in the form they want it in, at the time they need it to make decisions.

So this new approach takes advantage of a combination of disruptive technology and next-generation software applications to make that combination possible for most - if not all - corporate users.

The reigning "disruptive" - or game-changing - technology in the marketplace today is Cloud Computing. It offers a platform for applications that is so much faster to implement, cheaper to acquire and more flexible to operate that it opens all sorts of new possibilities for BI.

Say you need to create a datamart in three weeks from five different data sources, that scales automatically from 3 administrative users to 500 data consumers at four peak periods during the month? OK.

Then you need to charge this application back to ten different cost centers based on compute time, data storage and analytical usage (all within the same development timeframe?) OK.

And you need to do all of this with a budget that was cut 20% from last year. OK.

So where's the magic?

Cloud ComputingWell, you start with a Cloud computing platform like Amazon's Web Services (AWS). Amazon has very quietly become one of the leaders in Cloud Computing, based on the massive investment in hardware, software and security that they have made for their online marketplace. AWS is already over 30% larger in terms of compute power than the online marketplace, and is growing even faster.

Amazon is pursuing a low-cost provider strategy that can offer a good-sized virtual server for as little as 3 cents an hour - less than $22/month.

And they just cut their prices by 15% in December. When is the last time a hardware vendor offered you such a reduction in price without an excruciating 6-month negotiation? Price leadership is part of their business plan, so such price reductions just happen on a regular basis.

In fact, AWS offers all of its customers a computing platform at a lower price than what the largest enterprises in the world have probably been able to negotiate with hardware vendors. Which is why many of these companies have also begun to "cloud-source" to Amazon.

This kind of pricing makes all sorts of new or re-worked applications viable. In fact, it could spawn dozens of applications in no time at all. So to manage all of these new applications, you need an application like Rightscale - which is a major partner of AWS.

Rightscale provides automation to reduce the complexity and administrative burden of managing servers and applications, as well as cloud-ready server templates and best-practices deployment architectures to manage, monitor and troubleshoot all your applications.

In short, it makes it easy to organize and manage cloud-based applications - removing a major stumbling block to the rapid deployment of powerful new applications.

So once you have dirt-cheap computing and world-class application management in place, you can focus on key applications - like the next generation of BI.
Next-Generation SoftwareAt the heart of this next-generation BI platform is the database. There is a new kind of SQL database now available that indexes on columns, rather than rows of data like most relational databases do today.

These "Columnar" databases are able to return massive queries and reports in a fraction of the time that a traditional SQL database would take. The original Columnar database 10 years ago was Sybase IQ, which due to its vendor's decline in the marketplace (as well as some technical shortcomings) only received limited acceptance.

Four years ago we were introduced by a venture capitalist to Vertica, which surpassed Sybase IQ's columnar capabilities with a number of additional innovations - on-the-fly data compression, full utilization of massively parallel processing (MPP) facilities (like those made available by AWS), and a "shared nothing" architecture that avoids bottelnecks.

Using cheap AWS server cycles, Vertica can decompress a database in memory faster than a traditional system can access it from disk - saving on storage costs while providing faster response time. And it can scale across MPP systems to make BigData applications possible.

"BigData" is a term used for the massive quantities of raw and processed data that are being generated today in applications such as retail operations. Think of all the McBurgers sold every minute of every the day in a dozen different combinations. That purchasing information is valuable - but only if it can be made manageable, and accessible.

On top of this new, more-powerful SQL database you then add two applications that originated in the Open-Source world: Jaspersoft and Talend.

Jaspersoft is an innovative new reporting, data analysis, and OLAP platform that because of its Open-Source roots, is available today for just a fraction of what the traditional vendors' applications cost. With more than 8 million downloads and over 90,000 registered members, Jaspersoft describes itself as the most widely-used BI tool in the world.

Talend has partnered with Jaspersoft for some time, to provide an Open-Source ETL (Extraction, Transformation and Load) facility to access data from a variety of sources and load them into a database for reporting and analysis. It offers a rich selection of ETL capabilities at a fraction of the cost of traditional tools. One example of its revolutionary pricing model is that it offers over 400 different data connectors - at no extra charge. 

Both of these vendors are rapidly advancing their capabilities with a healthy set of open-source developers, and provide the kind of security and manageability features that you expect from enterprise software in the "corporate" versions of their applications.

 

IntegrationAnd finally, as a seasoned systems integrator in the data warehousing world, Full 360 adds two crucial components to this BI/DW stack:
  1. A data warehouse automation tool called metaController that orchestrates the processes of a data warehouse "factory" to produce new datasets, datamarts, OLAP cubes and reports. It replaces much of the onerous scripting required to operate a data warehouse, and will notify administrators and end-users alike of the availability of data and reports. 

  2. Broad and deep experience with massive data warehouses and enterprise performance management applications that allows us to design and create next-generation BI applications quickly, efficiently and at relatively low cost.

So it's no surprise that Rightscale, Vertica, Jaspersoft, Talend and Full 360 have banded together to offer a next-generation BI and Data Warehouse stack, which Full 360 is offering as The Agile Data Warehouse - starting at $1,500 per month (plus AWS charges). You'd find it difficult to buy hardware alone for the cost of an Agile Data Warehouse.

If you'd like to learn more about this exciting new offering, Full 360 is presenting a webinar on the Agile Data Warehouse on Thursday February 4 at 11 am EST, 8 am PST.  You can sign up here.

 

Migrating a Linux S3 Based AMI to an EBS Based AMI

Mon, 12/07/2009 - 21:05

While we have been booting from EBS a month after it came out using pivot_root - most of our implementations are based on a couple of AMIs and many different EBS snapshots and volumes. We were nonetheless happy to see Amazon's announcement to allow AMIs to boot from AMIs, it makes some of our scripts much simpler. I put this article together to show migrating an existing Linux AMI to EBS can be a trivial task.

Once you have the AMI you want to migrate booted up, ssh into the machine. Use the following bootstrap script to prep the machine for running the ami tools and api. The script after the bootstrap will setup a volume snapshot ready to be registered as an AMI.

You will require the following packages, if they are not already installed on the AMI:

openssh-client openssh-server curl unzip wget rsync parted bc sudo ruby libopenssl-ruby1.8 openjdk-6-jre-headless

 

This script bootstraps your AWS environment, you can skip this if you already have this done, but you will need to modify the next script to match your environment:

cat > aws_bootstrap.sh << \BOOTEOF #!/bin/bash #EC2 AMI Tools cd /tmp wget http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.zip -O ec2-ami-tools.zip cd /usr/local unzip /tmp/ec2-ami-tools.zip ln -s `find . -type d -name ec2-ami-tools-*` ec2-ami-tools chmod -R go-rwsx ec2* rm -rf /tmp/ec2*
#EC2 API Tools cd /tmp wget http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip cd /usr/local unzip /tmp/ec2-api-tools.zip ln -s `find . -type d -name ec2-api-tools*` ec2-api-tools chmod -R go-rwsx ec2* rm -rf /tmp/ec2*
mkdir ~/.aws cat > ~/.aws/cert.pem <<\EOF -----BEGIN CERTIFICATE----- <snip - insert the text from your certificate> -----END CERTIFICATE----- EOF cat > ~/.aws/pk.pem <<\EOF -----BEGIN PRIVATE KEY----- <snip - insert the text from your private key> -----END PRIVATE KEY----- EOF
cat >~/.aws/aws.sh <<\EOF #!/bin/bash export EC2_PRIVATE_KEY=/mnt/ec2/pk.pem export EC2_CERT=/mnt/ec2/cert.pem export EC2_AMITOOL_HOME=/usr/local/ec2-ami-tools export EC2_APITOOL_HOME=/usr/local/ec2-api-tools export EC2_HOME=/usr/local/ec2-api-tools export JAVA_HOME=/usr export AMAZON_USER_ID=<snip!!!> export AWS_ACCESS_KEY_ID=<snip!!!> export AWS_SECRET_ACCESS_KEY=<snip!!!> PATH=$EC2_AMITOOL_HOME/bin:$PATH EOF chmod -R 600 ~/.aws chmod o+x ~/.aws/aws.sh BOOTEOF

This script does the migration. I could have used rsync instead of ec2-bundle, but this makes sure the resulting image looks like what Amazon is expecting.

cat > migrate2ebs.sh << \MIGEOF # Change these to suit your environment vol_size=20 dev=/dev/sdp desc="My First EBS Migration" rm -rf /usr/local/ec*
# Call the environment setup script . ~/.aws/aws.sh
# Get basic info from instance meta-data instance_id=`curl -s http://169.254.169.254/latest/meta-data/instance-id` avail_zone=`curl -s \ http://169.254.169.254/latest/meta-data/placement/availability-zone`
# Create the Volume vol=`ec2-create-volume -K "$EC2_PRIVATE_KEY" -C "$EC2_CERT" -z "$avail_zone"\ --size $vol_size| cut -f2`
# Attach the volume ec2attvol "$vol" -K "$EC2_PRIVATE_KEY" -C "$EC2_CERT" -i "$instance_id" -d "$dev" while [[ "$vol_status" != "attached"  ]]; do vol_status=`ec2-describe-volumes -K "$EC2_PRIVATE_KEY" -C "$EC2_CERT" "$vol"\ | grep ATTACHMENT | cut -f5` echo Status of "$vol" : $vol_status done
# Prepare the volume mkfs.ext3 "$dev" mkdir -p /vol mount "$dev" /vol rm -rf /mnt/image* rm -rf /mnt/img-mnt
# Use bundle to create a clean image (we will not upload) ec2-bundle-vol -c $EC2_CERT -k $EC2_PRIVATE_KEY -u $AMAZON_USER_ID \ -e /vol -d /mnt
# take the clean image and install on the EBS Volume mount -o loop /mnt/image /mnt/img-mnt rsync -av /mnt/img-mnt/ /vol/
# Set the fstab up cat > /vol/etc/fstab << FSTABEOF # <file system>                                 <mount point>   <type>  <options>       <dump>  <pass> proc                                            /proc           proc    defaults        0       0 /dev/sda3                                       None            swap    defaults        0       0 /dev/sdb                                       /               ext3    defaults        0       0 /dev/sda2                                       /mnt            ext3    defaults        0       0 FSTABEOF
# Snapshot the volume. Note the snapshot id for the registration step umount /vol ec2addsnap -C $EC2_CERT -K $EC2_PRIVATE_KEY -d $desc $vol MIGEOF chmod o+x migrate2ebs.sh

Run the scripts, and note the snapshot id from the last step in the script, you'll need it for the AMI registration

./aws_bootstrap.sh ./migrate2ebs.sh


now all that is left to do is register the snapshot created in the last step

ec2-register -n "MyFirstEBSMigration" -s <snapshotid> -b /dev/sda=ephemeral0 --kernel <kernel> --ramdisk <kernel> --root-device-name /dev/sdb



If everything ran successfully, you should have a brand spanking new EBS based AMI that is a mirror of your old AMI.

This works for us, but your mileage might vary! drop me a line via twitter (@ramarnat) if you have questions.

 

 

about us | careers | support
Copyright © Full 360 | All Rights Reserved 2008 | Legal | Privacy