What is Nutch 1.10 crawl command for elasticsearch -
using nutch 1.10 (newbie), trying learn how crawl using nutch 1.10 , using elasticsearch indexer. not sure why, can not crawl command work:
bin/crawl -i --elastic -d elastic.server.url=http://localhost:9200/elastic/ urls elastictestcrawl 1
update: used
bin/crawl -i -d elastic.server.url=http://localhost:9200/elastic/ urls/ elastictestcrawl/ 2
--almost succesfully, received following error when came indexing part of command:
error running: /home/david/apache-nutch-1.10/bin/nutch clean -delastic.server.url=http://localhost:9200/elastic/ elastictestcrawl//crawldb failed exit value 255.
what exit value 255 nutch 1.x? , why space deleted between "-d , elastic..."
i have these elasticsearch properties here in nutch-site.xml file:
if can point error of ways, great!
update posted own answer below, second one. had accepted first answer months ago when got working. answer more clear , concise make easier (and quicker) started nutch.
unfortunately can't tell you're going wrong i'm in same boat although can see running nutch , elastic on same box i've split across two.
i've not got work according guide found on integrating nutch 1.7 elastic should
bin/crawl urls/ testcrawl -depth 3 -topn 5
it may isn't working me because i've added complication of networking.
i assume have created index called elastictestindex in elastic instance , launched on box before trying run crawl?
should of guide got command is
https://www.mind-it.info/integrating-nutch-1-7-elasticsearch/
update:
i'm not sure i'm quite there yet using update i've got further had.
you putting in port 9200 web administartion port need use port 9300 interact service change port 9300
i'm not sure thing portion after slash refers index in example make sure have "elastic" set index. or change
blah (low rep score can't put in many urls) blah localhost:9300/[index name]/
so uses , index have created. if haven't created 1 can putty following command.
curl -xput 'http://localhost:9200/[index name]/'
using command supplied alternative port did run although i've yet extract crawl data elastic.
supplemental update:
it's dumping data crawled nutch elastic me , having put different index in on command line can tell ignores , uses ever in nutch-site.xml
Comments
Post a Comment