{"id":675,"date":"2021-12-16T11:20:36","date_gmt":"2021-12-16T16:20:36","guid":{"rendered":"http:\/\/www.hpux.ws\/?p=675"},"modified":"2021-12-16T11:20:36","modified_gmt":"2021-12-16T16:20:36","slug":"subtracting-one-file-from-another","status":"publish","type":"post","link":"https:\/\/www.hpux.ws\/?p=675","title":{"rendered":"&#8220;Subtracting&#8221; one file from another"},"content":{"rendered":"\n<p>I recently had the occasion to refactor a script (not mine) in which there was some convoluted logic to ensure that the contents of file A (an ascii file) were <em>fully contained<\/em> within file B (another ascii file).  Lots of looping and grepping were the order of the day.  I imagined that there had to be better way and first went down the path of using grep with a file as the source of the things I was looking for.  However, while that would get me partially there, it would not tell me if File A was <em>wholly contained <\/em>with file B.  I am sure I could made it work (somehow) using this idea, but again, I thought that there has to be a better way.<\/p>\n\n\n\n<p>With the help of Google (because of course), I stumbled across an HP-UX command (not unique to HP-UX of course) that in my 25 years of scripting on the HP-UX platform, I had never encountered nor used before: the <strong>comm <\/strong>command.  This command lets one implement set logic on ascii files.<\/p>\n\n\n\n<p>Let&#8217;s have a look . . .<\/p>\n\n\n\n<p>From the man page for comm:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> NAME\n      comm - select or reject lines common to two sorted files\n\n SYNOPSIS\n      comm [-[123]] file1 file2\n\nDESCRIPTION\n      comm reads file1 and file2, which should be ordered in increasing\n      collating sequence (see sort(1) and Environment Variables below), and\n      produces a three-column output:\n\n       Column 1:   Lines that appear only in file1,\n       Column 2:   Lines that appear only in file2,\n       Column 3:   Lines that appear in both files.\n\n  If - is used for file1 or file2, the standard input is used.\n\n  Options 1, 2, or 3 suppress printing of the corresponding column.\n  Thus comm -12 prints only the lines common to the two files; comm -23\n  prints only lines in the first file but not in the second; comm -123\n  does nothing useful.<\/code><\/pre>\n\n\n\n<p>So, &#8220;comm -12&#8221; performs the intersection operation on the files.  But &#8220;comm -23&#8221; should do the trick for what I was after: Subtracting file B from File A.  In my use case File B should be a proper subset of File A.  I can test for that via:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>checkCount=$(comm -23 $FILE_B_SORTED $FILE_A_SORTED | wc -l)<\/code><\/pre>\n\n\n\n<p>If the resulting count is &#8220;0&#8221;, I know that File B is <em>wholly contained<\/em> within File B.  If the count is not &#8220;0&#8221;, then I can consume what is in File B but not in File A and take appropriate action.<\/p>\n\n\n\n<p>Note that the two ASCII files need to be in sorted order &#8211; easy enough.  Here is an example of all of this put together to accomplish of &#8220;subtracting File B from File A&#8221;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>FILE_B_SORTED=$(mktemp)\nFILE_A_SORTED=$(mktemp)\n\nsort -u $FILE_B > $FILE_B_SORTED\nsort -u $FILE_A > $FILE_A_SORTED\n\n#\n# 'comm -23 file1 file2' says show me all the lines that appear in\n# file2 but not in file1.  Thus, here, we are ensuring that there is\n# nothing in the file2 that is NOT in file1.\n#\n# We count how many such lines there are - we expect there to be zero.\n#\n\ncheckCount=$(comm -23 $FILE_B_SORTED $FILE_A_SORTED | wc -l)\nif (( checkCount == 0 )); then\n   echo \"All entries in the File B are in File A  (goodness)\"\nelse\n   # Do something with the entries in File B that are not in File A\n   checkList=$(mktemp)\n   comm -23 $FILE_B_SORTED $FILE_A_SORTED > $checkList\n   exec  4&lt; $checkList\n   while read entry &lt;&4; do\n     # Do whatever is that needs to be done\n   done\nfi<\/code><\/pre>\n\n\n\n<p>All good stuff \ud83d\ude42 <\/p>\n\n\n\n<p>  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently had the occasion to refactor a script (not mine) in which there was some convoluted logic to ensure that the contents of file A (an ascii file) were fully contained within file B (another ascii file). Lots of looping and grepping were the order of the day. I imagined that there had to [&hellip;]<\/p>\n","protected":false},"author":1002497,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_kadence_starter_templates_imported_post":false,"footnotes":""},"categories":[39],"tags":[196,195,153],"class_list":["post-675","post","type-post","status-publish","format-standard","hentry","category-scripting","tag-set-logic-on-files","tag-subtracting-one-file-from-another","tag-hp-ux-script"],"_links":{"self":[{"href":"https:\/\/www.hpux.ws\/index.php?rest_route=\/wp\/v2\/posts\/675"}],"collection":[{"href":"https:\/\/www.hpux.ws\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hpux.ws\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hpux.ws\/index.php?rest_route=\/wp\/v2\/users\/1002497"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hpux.ws\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=675"}],"version-history":[{"count":4,"href":"https:\/\/www.hpux.ws\/index.php?rest_route=\/wp\/v2\/posts\/675\/revisions"}],"predecessor-version":[{"id":687,"href":"https:\/\/www.hpux.ws\/index.php?rest_route=\/wp\/v2\/posts\/675\/revisions\/687"}],"wp:attachment":[{"href":"https:\/\/www.hpux.ws\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=675"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hpux.ws\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=675"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hpux.ws\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=675"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}