Hi, I can see that You prefer AWK but the small program comm can also be very useful. I think it wouldn't barf on big files unless there are too many lines between diffs.
Example, this will show You the lines that does not exist in both files:
comm -3 oldfile.txt newfile.txt
So in the case of newfile.txt only having records removed, it would work.
Best regards,
Lakris
kindly ..but I didn't know that nawk or gawk will barf on big files!!!!
I only know that its limitation is the fields size only (columns)..
Is it right guys?
14341:- I thought that the issue is awk will barf
on big files, not that "comm" is faster on execution than awk; I know that shell commands always faster than external ones.
I hope I had clear the issue.
and still I don't know if awk has limitation on file raw or columns sizes.
Number of fields per record 100
Characters per input record 3000
Characters per output record 3000
Characters per field 1024
Characters per printf string 3000
Characters in literal string 400
Characters in character class 400
Files open 15
Pipes open 1
though, gawk,mawk and other latest version are the alternatives for these limitations.
reference - Orelly - sed & awk ch 10.8 Limitations
Hi ahmad.diab,
no, I didn't insinuate that awk and siblings would barf on big files, or big records. I only expressed my uncertainty whether comm would be sufficient for the job on big files.
with more number of patterns to be searched and more number of input records, the search is almost, o(n2)
and that is not the case with awk where the power of associative arrays can be used.
Build a map, and check if the map as you iterate through each records but the check is that you don't which file to use to create a map as we don't know the number of records before processing.