Monday, 9 September 2013

Why the Hive index can not improve the speed of my queries

Why the Hive index can not improve the speed of my queries

I have external Hive table, which has the structure which is essentially
like:
CREATE EXTERNAL TABLE foo (time double, name string, value double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hfds://node/foodir
And I create an index for (name, value).
CREATE INDEX idx ON TABLE foo(name, value)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;
ALTER INDEX ts_idx ON trionsort REBUILD;
The query I have is:
SELETE minute, count(minute) AS mincount
FROM (SELECT round(time/60) AS minute FROM foo WHERE name = 'Foo'
and value > 100000) t2 GROUP BY minute ORDER BY mincount DESC LIMIT 1;
However, although the rows that satisfy the conditions (name = 'Foo' and
value > 100000) might only contribut less than 0.1% of all rows. This Hive
query still runs against the whole dataset and the speed is comparable to
running on a table without index.
Is there anything wrong with the index scheme or the query?

No comments:

Post a Comment