Home > Blog >

Counting Triangles on the Dirt Cheap Data Warehouse

Not long ago, I stumbled upon a 2011 article from Vertica’s Stephen Walkauskas titled Counting Triangles that compared Hadoop, Hadoop/Pig, and Vertica when applied to the problem of counting triangles in a data set. Stephen was driven to write his article by a professor who said that he used Hadoop “because a database doesn’t perform the necessary joins efficiently.” Steven clearly showed that, of all the things relational databases don’t do well, joins aren’t one of them; The Hadoop solution took 65 minutes while Vertica finished in just 97 seconds – both on a four node cluster of 12-core servers.

Not long afterward, Greg Rahn published an article in response titled Counting Triangles Faster in which he used an Exadata X2-2 system from Oracle, also with four 12-core database nodes, against the same data set. Exadata was almost exactly as fast as Vertica when running the same SQL. Greg then went further and optimized the SQL, after which the query ran in just 14 seconds. Remember that Hadoop required 3900 seconds.

The lesson to be learned? Use the right tool, not the newest and shiniest one. While obvious, this lesson is often ignored by us engineers right after we learn a new skill or find an exciting tool, be it Hadoop or Haskell, Cassandra or Clojure, etc.

How does the Dirt Cheap Data Warehouse Compare?

The Exadata X2-2 half rack cost $500K when these articles were written – Oracle now offers the Exadata X3 instead, which is $475,000 for a 1/4 rack version that is likely every bit as fast. What about the $10K Dirt Cheap Data Warehouse?

To see for myself, I downloaded the original data and loaded it into a single-server DCDW development machine. Like the four-node Vertica cluster and the Oracle Exadata X2-2 half rack, the DCDW has 48 cores. In the case of the DCDW, however, there are four CPUs instead of eight and all of them are on one motherboard. This version of the DCDW has 256GB of RAM while the Vertica cluster had 384GB and the Exadata had, I believe, 576GB – enought run everything in memory according to Greg.

First, I ran the original query from the Vertica article, with parallel hints for my system:

select /*+ parallel(48) */ count(*)
from edges e1
join edges e2 on e1.dest = e2.source and e1.source < e2.source:
join edges e3 on e2.dest = e3.source and e3.dest = e1.source and e2.source < e3.source;

Vertica cluster: 97 seconds
Exadata X2-2: 90 seconds
DCDW: 76 seconds

Apparently, having all 48 cores in one system works better – for this query at least – than having the same number of cores worth of much faster Xeon CPUs plus enough RAM to hold all of the data in memory. Had the data set been larger, the DCDW, with its ultra-fast SSD disks, would have looked even better.

Next, I ran Greg’s modified SQL query:

with
e1 as (select /*+ parallel(48) */ * from edges where source < dest),
e2 as (select /*+ parallel(48) */ * from edges where source < dest),
e3 as (select /*+ parallel(48) */ * from edges where source > dest)
select /*+ parallel(48) */ count(*)
from e1
join e2 on (e1.dest = e2.source)
join e3 on (e2.dest = e3.source)
where e3.dest = e1.source;

Exadata X2-2: 14 seconds
DCDW: 41 seconds

While the modified query runs much faster than the original on the DCDW, and twice as fast as Vertica, it still falls far behind the Exadata. My suspicion is that the larger Exadata cluster is able to hold the entire data set in memory for this query. Alternatively, I may have the always-challenging Oracle buffer and memory settings set improperly. I remember running this same SQL on another DCDW instance and getting far better results, but I cannot find the notes.

Before we end, there is one additional optimization that we can apply. The triangle-counting query might just run faster if the data were sorted, as with Vertica, and luckily Oracle offers index organized tables that do that automatically and with a reasonably modest insert overhead. We can create a new table with one simple modification:

CREATE TABLE EDGES_IOT
(
SOURCE NUMBER(19, 0) NOT NULL
, DEST NUMBER(19, 0) NOT NULL
, CONSTRAINT EDGES_IOT_PK PRIMARY KEY (SOURCE, DEST) ENABLE
)
ORGANIZATION INDEX;

With the new index-organized table, we repeat the test using the Oracle X2-2 modified query:

with
e1 as (select /*+ parallel(48) */ * from edges_iot where source < dest),
e2 as (select /*+ parallel(48) */ * from edges_iot where source < dest),
e3 as (select /*+ parallel(48) */ * from edges_iot where source > dest)
select /*+ parallel(48) */ count(*)
from e1
join e2 on (e1.dest = e2.source)
join e3 on (e2.dest = e3.source)
where e3.dest = e1.source;

Exadata X2-2: 14 seconds
DCDW: 14 seconds

That’s better! We managed to equal the performance of an Exadata half-rack using less than $10K worth of server and storage. On the downside, the index-organized table is large – 2GB instead of 1.2GB for a normal table and as little as 262MB with Exadata hybrid columnar compression. Had the original data set been a few Terabytes, this would be significant. I later created a version of the table that used compressed keys, dropping the total size down to 1.4GB and found that the query time was unchanged.

Leave a reply