regex - Regex_Extract Using PIG -
i using pig string in set of records , output name of file(which appended end of each record using udf) , count of matching string. file name looks follows 2015-03-04.23_55_05.abhi_ram.info.json.
below pig script:
register udf; input_data= load 'input_dir' using classname(); record_match = filter input_data $0 matches '$search_string'; group_record = group record_match all; record_count = foreach record_match generate regex_extract($0,'((\\d{4}-\\d{2}-\\d{2})\\.(\\d.*)\.(\\w.*)\\.(\\w.*)\\.(json))',1), count(record_match); dump record_count;
i want output
2015-03-04.23_55_05.abhi_ram.info.json, count($search_string).
am missing in regex?
i did'nt why want apply regex first field source. first field has filename pattern ?. because,
$0 -> denotes first record in row.
then,if want source filename in output record.the easy way
read = load 'inp.data' using pigstorage(',','-tagsource');
which append source filename record starting.
2015-03-04.23_55_05.abhi_ram.info.json, count($search_string)
as per question , regex_extract :
read = load 'test.data' using pigstorage(','); date = foreach read generate flatten(regex_extract($0,'(\\d{4})-(\\d{2})-(\\d{2}).(\\d{2})_(\\d{2})_(\\d{2}).(\\w.*)_(\\w.*).(\\w.*).json',0)) (filename_dt:chararray); dump date;
Comments
Post a Comment