Perl - CAPTURE query

gefa · April 30, 2010, 10:09am

When i run script with option 1 it works, however if I assign path to variable $tmpfile and try to use that it writes to $tmpfile and not /var/tmp/abc.tmp, what am I missing?

open (CAPTURE, '>>tmpfile');
print CAPTURE "$T1,$T2,$T3";
close (CAPTURE);

my $tmpfile='/var/tmp/abc.tmp';
open (CAPTURE, '>>$file');
print CAPTURE "$T1,$T2,$T3";
close (CAPTURE);

pseudocoder · April 30, 2010, 10:22am

What do you mean "with option 1 and 2" ?
Assuming that are two different scripts, than I guess the open line of the second script need to be adjusted to:

open (CAPTURE, '>>$tmpfile');

drewk · April 30, 2010, 11:45am

Change the single quotes ' ' in option 2 to double quotes " "

In Perl, no variable interpolation takes place inside single quotes but does in double quotes. In single quotes, open is trying to open a file literally called '$file" and of course fails.

You will solve your own problem on something like this if you check the return of open like so:

open (CAPTURE, ">>$file") || die "can't open $file cause $!"

Also, the for of open you are using there is being deprecated for a three argement form with a scalar handle. It allows the Perl interpreter to give you more meaningful error and compile errors if you use the 3 arg form of open:

Read about it here

gefa · May 4, 2010, 6:54am

Many thanks for that.

Just another question how is possible to split a field within perl

for example in a file I might have something like these fields seperated by a space;

this is a test
again this is a test
...

I want to read the file line by line and split each line so that I have a variable containing field one and a second variable containing all other fields?

I can do this in awk but being new to perl I'm not sure how to do it in perl.

drewk · May 4, 2010, 9:52am

The Perl operator split will split fields on whitespace or any other regular expression in one go.

Here is the documentation for that. (Or $ perldoc -f split at the command prompt) The part you want specifically is:

If EXPR is omitted, splits the $_ string.  If PATTERN is also
omitted, splits on whitespace (after skipping any leading
whitespace).

Here are some Perl resources:

Books:
Learning Perl

Programming Perl AKA The Camel Book

There are also tremendous resources on the web:

Each and every Perl tutorial These are also part of your Perl install in most cases. Type $ perldoc perlreftut at the command prompt for example.

Perlmonks and Perlmonks Tutorials

Jaffe's regex tutorial

tye's ref reference

And once you know a bit of Perl, read the Perldoc site front to back

gefa · May 5, 2010, 4:17am

Thanks for the info, is it possible do do something like the following awk statement in perl as a one liner?
to split and populate the variable $acmodel with contents of field 19 which has more than word, the file $acfilename is seperated by comma however field 19 I'd like to put a comma after the the first word to effectively make two fields out of it.

e.g this is a test would become this, is a test

my $acmodel=`cat $acfilename |awk -F, '{print $19}' |awk '{ for (i=2 ; i<=NF ; i++) print $1,$i}'

durden_tyler · May 5, 2010, 9:35am

gefa:

... is it possible do do something like the following awk statement in perl as a one liner?
to split and populate the variable $acmodel with contents of field 19 which has more than word, the file $acfilename is seperated by comma however field 19 I'd like to put a comma after the the first word to effectively make two fields out of it.

e.g this is a test would become this, is a test
my $acmodel=`cat $acfilename |awk -F, '{print $19}' |awk '{ for (i=2 ; i<=NF ; i++) print $1,$i}'

I am not sure if your awk script works the way you say it does.

$
$ ## the awk script, for the 4th field instead of the 19th
$ echo "abc,def,456,the good bad ugly,xyz" |awk -F, '{print $4}' |awk '{ for (i=2 ; i<=NF ; i++) print $1,$i}'
the good
the bad
the ugly
$
$

You may want to show us what the line in the file pointed at by the variable $acfilename looks like.

In any case, a Perl equivalent is as follows:

$
$ ## the Perl script
$ echo "abc,def,456,the good bad ugly,xyz" | perl -F, -lane '@x=split/ /,$F[3]; for($i=1; $i<=$#x; $i++){print "$x[0] $x[$i]"}'
the good
the bad
the ugly
$
$

tyler_durden

---------- Post updated at 09:35 AM ---------- Previous update was at 09:15 AM ----------

On the other hand, if the 19th field in the comma-delimited file is:

this is a test

and you want to convert that to -

this, is a test

then here's an idea:

$
$
$ echo "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,this is a test,20" | perl -pe 's/(([^,]*,){18}\w+)/$1,/'
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,this, is a test,20
$
$

tyler_durden

gefa · May 5, 2010, 10:58am

Hi thanks again,

the file will be something like as the text will be surrounded by quotes;

1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2-this is a test",21
...

when I run

cat file.tmp | perl -pe 's/(([^,]*,){18}\w+)/$1,/'

it doesn't like the double quotes, I would want it to produce

something like

1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-","this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2"-"this is a test",21
...

---------- Post updated at 09:58 AM ---------- Previous update was at 08:56 AM ----------

Hopefully final question,

How do I call this from within a script so that it writes into a secondary file

e.g.,

my $tmpfile2=/tmp/tmpfile2.tmp;
cat /tmp/file.tmp | perl -pe 's/(([^,]*,){18}\w+)/$1,/' >> $tmpfile2

durden_tyler · May 5, 2010, 12:44pm

Firstly, this -

is eligible for Useless Use of Cat Award

And secondly -

It doesn't like the fact that the double quotes aren't mentioned in your regex.

$
$
$ cat f0
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2-this is a test",21
$
$ perl -pe 's/(([^,]*,){18}"\d+-)/$1","/' f0
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-","this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2-","this is a test",21
$
$

If, by "a script", you mean "a shell script" then you can simply put whatever you executed on the dollar-prompt inside your shell script.

If it runs on the *nix command line successfully, it should run when put inside a shell script as well.

HTH,
tyler_durden

drewk · May 5, 2010, 12:56pm

gefa:

Hi thanks again,

the file will be something like as the text will be surrounded by quotes;
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2-this is a test",21
...
when I run
cat file.tmp | perl -pe 's/(([^,]*,){18}\w+)/$1,/'
it doesn't like the double quotes, I would want it to produce

something like
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-","this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2"-"this is a test",21
...
]

Be aware that Comma Separated Value "CSV" is far more complex in practice than simple test cases often suggest. This applies to Perl, Awk, whatever regex or parsing tool you are using.

If you are absolutely, positively, sure of your CSV format, you can write your own regex against it. Too often though, your CSV format is controlled by others and there are subtleties that will throw off your best laid plans.

Here is a regex I use often for CSV that I control. If I don't, I use Text::CSV cpan library...

That regex can be changed to get the nth field of CSV like so:

#!/usr/bin/perl
use warnings;
use strict;

while(<DATA>) {
	chomp;
	my $str=$_;
	
	#if you want to deal with all at once:
	my @fields=/(?:^|,)("(?:[^"]+|"")*"|[^,]*)/g;
	my $i=0;
	print "All fields= $str\n";
	foreach my $field (@fields) {
   		print "field $i of $#fields = \'$field\'\n";
   		$i+=1;
   	}	
   	
   	#if you want to deal with field n
   	my $n=18;
   	my $actual=$n+1;  #there is no zeroth match...
   	$str=~/((?:^|,)(?:"(?:[^"]+|"")*"|[^,]*)){$actual}/; 	
   	print "single field, field $n=$1\n\n";
}

__DATA__
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"1-this is a test",20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,"2-this is a test",21

The beauty of Perl is the strength of the solutions on CPAN. In the case of CSV, XML, HTML, or other difficult to regex things, use a CPAN tool that has been tested in literally millions of cases so that you are more certain of your solution.

Here is a good overview of parsing CSV with Perl. This is the area of strength of Perl, and it still has its difficulties...

gefa · May 6, 2010, 8:05am

Many thanks for all your help.