On Wed, 9 Jan 2008, Florin Iucha wrote:
> And the Oscar goes to:
>
> find /some/dir -type f -printf "%h/%f %T@\n" | awk '{ if ($2 > the_max) { the_max = $2; file_name = $1; } }
> END { print file_name }'
>
> I would like to thank Google for its search engine and to the find man
> page for its thorough description of the million options and switches...
This is the stuff I like most on LUG lists -- learning all the cool tricks
with GNU/UNIX/Linux commands. So much can be done but it takes years to
learn all the efficient ways of doing things. I've used awk/gawk a
gazillion times but only in a few ways, so using it to find a maximum was
not in my repertoire, but that is an excellent idea. I always would have
sorted the file even though I knew that couldn't be the best way to go.
That said, there are still some problems with the one-liner above. First
and foremost, if any file in the tree contains a space in the filename,
the command will fail. At first I was going to say that the problem is in
the printf argument because it doesn't uses a space as delimiter between
the file name and date stamp:
$ find . -type f -printf "%h/%f %T@\n"
./Lee, Alvin - I'm Going Home.txt 1182200822
./0_TABLATURE_EXPLANATION.txt 1118104853
./Semisonic - FNT.txt 1153491460
./Animals - House of the Rising Sun.tab.txt 1142214281
[snip]
But maybe it is better to say that the problem is with the awk command.
If we replace $2 with $NF and replace $1 with $0, we get this:
find /some/dir -type f -printf "%h/%f %T@\n" | awk '{ if ($NF > the_max) { the_max = $NF; file_name = $0; } }
END { print file_name }'
But the problem with that is that it retains the date stamp at the end
like so:
./Lee, Alvin - I'm Going Home.txt 1182200822
But that can be removed by adding a little perl (or sed) regexp thingy at
the end:
find /some/dir -type f -printf "%h/%f %T@\n" | awk '{ if ($NF > the_max) { the_max = $NF; file_name = $0; } }
END { print file_name }' | perl -pe 's/^(.+) [0-9]+$/$1/'
That will run almost exactly as fast as the earlier suggestion because the
perl bit at the end is very fast and it is only done on the single line of
output at the end. On the other hand, you didn't say that you wanted the
filename, you said that you wanted the date. That simplifies things a
bit! You can do this:
find /some/dir -type f -printf "%T@\n" | awk '{ if ($1 > the_max) { the_max = $1; } } END { print the_max }'
That returns the modification date of the newest file in seconds since
1970-01-01 00:00:00 UTC. If you want a different date format, we can
discuss that. There must be a good trick. You can get the current time
in that format using the date command as follows:
date +%s
There are other forms of weirdness with UNIX filenames, like they can
include a newline, and that will also mess you up, but maybe that never
happens on your system (and if you and your users and your software are
all sane, it won't happen!).
Do you want to find the newest file as of the moment your script starts
running, or will you want to detect new files that are created after the
script starts running but before it finishes? Maybe this isn't an
important consideration for you, but you should be aware that what you
mean by the "newest file" isn't defined precisely by the method you are
using to identify it.
Best,
Mike