Computersight > Programming

Sorting with PERL

I recently had to sort a rather large file for a client of mine using a PERL script. The output had to be a unique list of the input file.

The client was running Windows and the sort function in DOS did not seem to have a -u parameter as it does in Linux. Since the client already had PERL installed (for the other script), I decided to write a sort by unique script in PERL. I was quite surprised by the results. The file in question was 326MB* (a pipe-delimited scrape of the business listings on Yellow Pages.

Using time cat listings.csv | sort -u >> test.csv took approximately 6 minutes and 30 seconds. Sorting the same file with my PERL script took approximately 10 seconds. The Linux sort function is written in C. I find this interesting as C is generally much faster than PERL (although design is far more important for optimization than is the speed of the language). Since my little script obviously isn't the result of some ingenious design, I think what this best illustrates is that certain languages are best for certain jobs because of their inherent data-structures. Some data-structures are simply better suited for certain jobs and allow for simpler algorithms.

#!/usr/bin/perl use strict; (our $input, our $output) =@ARGV; our %uniques = (); open(INPUT, "< $input") or die "Cannot open input file $input...n"; open(OUTPUT, ">> $output") or print "Cannot open output file $output...n"; while(my $line = ) { $uniques{$line} = $line; } close(INPUT); foreach my $key (sort (keys %uniques)) { print OUTPUT $key; } close(OUTPUT); sub ksort() { $uniques{$b} <=> $uniques{$a}; } * This file contained a large amount of redundancy - the resulting output was only 1MB (.003 of the original).

0
Liked It
I Like It!
Related Articles
How to Install and Setup Bweb  |  An Object Oriented Labview Test System Design
Latest Articles in Programming
Microsoft Sql Server 2005: An Easy Task  |  SQL and Databases
Comments (0)
Post Your Comment:
Name:  
Copy the code into this box:  
Post comment with your Triond credentials?
Inside Computersight

Communication & Networks

 /

Computers

 /

Hardware

 /

Operating Systems

 /

Programming

 /

Software


Popular Tags
Popular Writers
Powered by
Computersight
About Us
Terms of Use
Privacy Policy
Services
Submit an Article
Advertise with Us
Contact

© 2007 Copyright Stanza Ltd. All Rights Reserved.