Selecting one file from a list

Hi,

I am able to do this by brute force but I am just curious if there is a better way of handling things. Basically the scenario is something like this:

There are a number of files in a directory:

[   ] rib.20071224.1759.gz    24-Dec-2007 17:59  132K  
[   ] rib.20071224.1959.gz    24-Dec-2007 19:59  132K  
[   ] rib.20071224.2159.gz    24-Dec-2007 21:59  132K  
[   ] rib.20071224.2359.gz    24-Dec-2007 23:59  132K  
[   ] rib.20071225.0159.gz    25-Dec-2007 01:59  132K  
[   ] rib.20071225.0359.gz    25-Dec-2007 03:59  132K  
[   ] rib.20071225.0559.gz    25-Dec-2007 05:59  132K  
[   ] rib.20071225.0759.gz    25-Dec-2007 07:59  132K  
[   ] rib.20071225.0959.gz    25-Dec-2007 09:59  132K  
[   ] rib.20071225.1159.gz    25-Dec-2007 11:59  132K  
[   ] rib.20071225.1359.gz    25-Dec-2007 13:59  132K  
[   ] rib.20071225.1559.gz    25-Dec-2007 15:59  132K  
[   ] rib.20071225.1759.gz    25-Dec-2007 17:59  132K  
[   ] rib.20071225.1959.gz    25-Dec-2007 19:59  132K  
[   ] rib.20071225.2159.gz    25-Dec-2007 21:59  132K  
[   ] rib.20071225.2359.gz    25-Dec-2007 23:59  132K  
[   ] rib.20071226.0159.gz    26-Dec-2007 01:59  132K  
[   ] rib.20071226.0359.gz    26-Dec-2007 03:59  132K  
[   ] rib.20071226.0559.gz    26-Dec-2007 05:59  132K  
[   ] rib.20071226.0759.gz    26-Dec-2007 07:59  132K  
[   ] rib.20071226.0959.gz    26-Dec-2007 09:59  133K  
[   ] rib.20071226.1159.gz    26-Dec-2007 11:59  132K  
[   ] rib.20071226.1359.gz    26-Dec-2007 13:59  133K  
[   ] rib.20071226.1559.gz    26-Dec-2007 15:59  132K  
[   ] rib.20071226.1759.gz    26-Dec-2007 17:59  133K  
[   ] rib.20071226.1959.gz    26-Dec-2007 19:59  133K  
[   ] rib.20071226.2159.gz    26-Dec-2007 21:59  133K  
[   ] rib.20071226.2359.gz    26-Dec-2007 23:59  132K  
[   ] rib.20071227.0159.gz    27-Dec-2007 01:59  132K  
[   ] rib.20071227.0359.gz    27-Dec-2007 03:59  132K  
[   ] rib.20071227.0559.gz    27-Dec-2007 05:59  132K  
[   ] rib.20071227.0759.gz    27-Dec-2007 07:59  132K  
[   ] rib.20071227.0959.gz    27-Dec-2007 09:59  133K  
[   ] rib.20071227.1159.gz    27-Dec-2007 11:59  131K  
[   ] rib.20071227.1359.gz    27-Dec-2007 13:59  132K  

I want to be able to copy only one file from each group i.e. basically when I have the following:

[   ] rib.20071224.1759.gz    24-Dec-2007 17:59  132K  
[   ] rib.20071224.1959.gz    24-Dec-2007 19:59  132K  
[   ] rib.20071224.2159.gz    24-Dec-2007 21:59  132K  
[   ] rib.20071224.2359.gz    24-Dec-2007 23:59  132K  

I want to select only the first file or one of the files but I want just one. So in this case, I would like to copy only rib.20071224.1759.gz. Is there a good way to solve this problem? As of now, I am using a scripting language like php to achieve this. Any advice is greatly appreciated.

Thanks

ls rib* | awk -F. '!a[$2]++' | xargs -i cp -p  '{}' /path/to/other_dir

Thanks a lot... Could you be kind enough to explain what exactly that is doing? I'm new to awk so any input is appreciated... From what I am assuming, it is basically storing names as keys and then taking care that there are no duplicate keys... is it? But still it seems a little surprising to me that it works...

awk -F. '!a[$2]++'

-F. - set field separator to a dot.
a[$2]++ - It's simply a counter, which is raised every time the same record is seen, and the field 2 of the filename is the key of the array a.

![a$2]++

It's the same as writing:

 { a[$2]=a[$2]+1
               if ( a[$2]==1 ) print $0 }

or

 a[$2]++==0 { print $0 }    
             This condition is true only when the record is seen the first time.
          By the same token if you need the second occurrence of the same filename, then the condition would have been:
a[$2]++==1 { print $0 }

for third :

a[$2]++==2 { print $0 }           

... and so on

xargs -i cp -p  '{}' /path/to/other_dir  

On the list of files filtered by awk, execute the copy command on each file individually.

if you are using PHP, here's an example

<?php
    $arr=array();
    $destination = '/dest';
    $directory = '/src';
    foreach ( glob("rib*gz" ) as $filename ) {
        echo "filename: $filename\n";
        $split = explode(".",$filename);
        if (array_key_exists($split[1],$arr) ) {
            continue;
        }else {
            $arr[$split[1]]=$filename;
        }
    }
    foreach ( $arr as $k=>$v ){
        if (! copy("$directory/$v","$destination/$v")) {
            echo "Cannot copy";
        }
    }
?>

Thank You so much. You made my day! Great explanation! Also, thanks ghostdog... I was actually trying to achieve the same in php but your logic seems much better than mine... thanks again...

Very Helpful and interesting....
Thanks a lot for the efforts

awesome rubin...thanks for the Idea!:smiley: