regex - Regex_Extract Using PIG -


i using pig string in set of records , output name of file(which appended end of each record using udf) , count of matching string. file name looks follows 2015-03-04.23_55_05.abhi_ram.info.json.

below pig script:

register udf;  input_data= load 'input_dir' using classname();  record_match = filter input_data $0 matches '$search_string';  group_record = group record_match all;  record_count = foreach record_match generate regex_extract($0,'((\\d{4}-\\d{2}-\\d{2})\\.(\\d.*)\.(\\w.*)\\.(\\w.*)\\.(json))',1), count(record_match);  dump record_count; 

i want output

2015-03-04.23_55_05.abhi_ram.info.json, count($search_string).

am missing in regex?

i did'nt why want apply regex first field source. first field has filename pattern ?. because,

$0 -> denotes first record in row.

then,if want source filename in output record.the easy way

read = load 'inp.data' using pigstorage(',','-tagsource'); 

which append source filename record starting.

2015-03-04.23_55_05.abhi_ram.info.json, count($search_string) 

as per question , regex_extract :

read = load 'test.data' using pigstorage(','); date = foreach read generate flatten(regex_extract($0,'(\\d{4})-(\\d{2})-(\\d{2}).(\\d{2})_(\\d{2})_(\\d{2}).(\\w.*)_(\\w.*).(\\w.*).json',0))      (filename_dt:chararray); dump date; 

Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -