linux - Merge two files with no pseudo-repetitions -
i have 2 text files file1.txt , file2.txt both contain lines of words this: fare word word-ed wo-ded wor ,
fa-re text uncial woded wor worded or this. word, mean succession of letters a-z possibly accents, symbol -. question is, how can create third file output.txt linux command line (using awk, sed etc.) out of these 2 files satisfies following 3 conditions:
- if same word occurs in 2 files, third file
output.txtcontains once. - if hyphenated version (for example
fa-rein file2.txt) of word in on file occurs in another, hyphenated version retained in output.txt (for example,fa-reretained in our example).
thus, output.txt should contain following words: fa-re word word-ed wo-ded wor text uncial
================edit========================
i have modified files , given output file well. try make sure manually there no differently hyphenated words (such wod-ed , wo-ded).
another awk:
!($1 in a) || $1 ~ "-" { key = value = $1; gsub("-","",key); a[key] = value } end { (i in a) print a[i] } $ awk -f npr.awk file1.txt file2.txt text word-ed uncial wor wo-ded word fa-re
Comments
Post a Comment