System requirements:

Ubuntu 16+

Java 8


Software versions:

Nutch 2.13


Elasticsearch 5


Start the Elasticsearch

Creat root dir for hbase , hbase config changes,

Start hbase

Do the nutch config changes, gora changes, hbase changes

Ant clean

Ant runtime

Build successful

Nutch inject seed/urls.txt

Nutch generate -topN 10

Nutch fetch -all

Nutch parse -all

Nutch updatedb -all

Nutch index elasticsearch -all

You should be able to query your Elasticsearch run following command.

Common errors:

Please make sure you are using exact versions mentioned here, as gora might not  be compatible with latest hbase version


Share This