python - Regex text between two strings -

- January 15, 2010

i trying extract data fields pdf texts using regex.

the text is:

"sample experian customer\n2288150 - experian sample reports\ndata dictionary report\nfiltered by:\ncustom selection\nmarketing element:\npage 1 of 284\n2014-11-11 21:52:01 pm\nexperian , marks used herein service marks or registered trademarks of experian.\n© experian 2014 rights reserved. confidential , proprietary.\n**data dictionary**\ndate of birth acquired public , proprietary files. these sources provide, @ minimum, year of birth; month provided available. exact date of birth @ various levels of detail available \n\n\n\n\n\nnote: records coded dob exclusive of estimated age (101e)\n**element number**\n0100\ndescription\ndate of birth / exact age\n**data dictionary**\n\n\n\n\n\n\n\n\n\n\nfiller, 3 bytes\n**element number**\n0000\n**description**\nenhancement mandatory append\n**data dictionary**\n\n\nwhen there insufficient data match customer's record our enrichment master estimated age, median estimated age based on ages of other adult individuals in same zip+4 area provided. \n\n\n\n\n\n\n00 = unknown\n**element number**\n0101e\n**description**\nestimated age\n"

the field names in bold. texts between field names field values.

the first time tried extract 'description' field using following regex:

pattern = re.compile('\ndescription\n(.*?)\ndata dictionary\n') re.findall(pattern,text)

the results correct:

['date of birth / exact age', 'enhancement mandatory append']

but using same idea extract 'data dictionary' field gives empty result:

pattern = re.compile('\ndata dictionary\n(.*?)\nelement number\n') re.findall(pattern,text)

results:

[]

any idea why?

. doesn't match newlines default. try:

pattern = re.compile('\ndata dictionary\n(.*?)\nelement number\n', flags=re.dotall) re.findall(pattern,text)

notice how passed re.dotall flags argument re.compile.

Search This Blog

Panthy J

python - Regex text between two strings -

Comments

Post a Comment

Popular posts from this blog

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

yii2 - Yii 2 Running a Cron in the basic template -

wso2esb - How to concatenate JSON array values in WSO2 ESB? -