[tclug-list] Finding the date of the newest file in a directory tree

Wed Jan 9 23:40:37 CST 2008

On Wed, 9 Jan 2008, Florin Iucha wrote:

> And the Oscar goes to:
>
> find /some/dir -type f -printf "%h/%f %T@\n" | awk '{ if ($2 > the_max) { the_max = $2; file_name = $1; } }
> END { print file_name }'
>
> I would like to thank Google for its search engine and to the find man 
> page for its thorough description of the million options and switches...

This is the stuff I like most on LUG lists -- learning all the cool tricks 
with GNU/UNIX/Linux commands.  So much can be done but it takes years to 
learn all the efficient ways of doing things.  I've used awk/gawk a 
gazillion times but only in a few ways, so using it to find a maximum was 
not in my repertoire, but that is an excellent idea.  I always would have 
sorted the file even though I knew that couldn't be the best way to go.

That said, there are still some problems with the one-liner above.  First 
and foremost, if any file in the tree contains a space in the filename, 
the command will fail.  At first I was going to say that the problem is in 
the printf argument because it doesn't uses a space as delimiter between 
the file name and date stamp:

$ find . -type f -printf "%h/%f %T@\n"
./Lee, Alvin - I'm Going Home.txt 1182200822
./0_TABLATURE_EXPLANATION.txt 1118104853
./Semisonic - FNT.txt 1153491460
./Animals - House of the Rising Sun.tab.txt 1142214281
[snip]

But maybe it is better to say that the problem is with the awk command. 
If we replace $2 with $NF and replace $1 with $0, we get this:

find /some/dir -type f -printf "%h/%f %T@\n" | awk '{ if ($NF > the_max) { the_max = $NF; file_name = $0; } }
END { print file_name }'

But the problem with that is that it retains the date stamp at the end 
like so:

./Lee, Alvin - I'm Going Home.txt 1182200822

But that can be removed by adding a little perl (or sed) regexp thingy at 
the end:

find /some/dir -type f -printf "%h/%f %T@\n" | awk '{ if ($NF > the_max) { the_max = $NF; file_name = $0; } }
END { print file_name }' | perl -pe 's/^(.+) [0-9]+$/$1/'

That will run almost exactly as fast as the earlier suggestion because the 
perl bit at the end is very fast and it is only done on the single line of 
output at the end.  On the other hand, you didn't say that you wanted the 
filename, you said that you wanted the date.  That simplifies things a 
bit!  You can do this:

find /some/dir -type f -printf "%T@\n" | awk '{ if ($1 > the_max) { the_max = $1; } } END { print the_max }'

That returns the modification date of the newest file in seconds since 
1970-01-01 00:00:00 UTC.  If you want a different date format, we can 
discuss that.  There must be a good trick.  You can get the current time 
in that format using the date command as follows:

date +%s

There are other forms of weirdness with UNIX filenames, like they can 
include a newline, and that will also mess you up, but maybe that never 
happens on your system (and if you and your users and your software are 
all sane, it won't happen!).

Do you want to find the newest file as of the moment your script starts 
running, or will you want to detect new files that are created after the 
script starts running but before it finishes?  Maybe this isn't an 
important consideration for you, but you should be aware that what you 
mean by the "newest file" isn't defined precisely by the method you are 
using to identify it.

Best,

Mike