Thanks, David. I thought that was the issue -- that apparent size would not include overhead, so I was not able to understand why I was getting apparent size that was smaller than ondisk size. After they moved my data to a different array, that difference reversed direction. This was explained to me last night: "on the old project spaces, zfs did some compression on the data so the apparent-size was larger than the ondisk size." So, compression is also an issue, and I wouldn't have thought of that. Now that there is no compression, I see that ondisk usage is 20GB more than apparent size: $ \du -sB GB --apparent-size miller 146GB miller $ \du -sB GB miller 166GB miller $ find miller | wc -l 9908 So there are about 2 million bytes of overhead per file, which seems like a lot, to me. I would think that implies disk blocks of multiple megabytes, which seems unlikely. There must be more that I don't understand. Regarding your idea (David)... > As an aside, imho, the 'apparent size' option is really a terrible > option to include in 'du' and is a violation of the unix philosophy > because it has explicitly NOTHING to do with disk management. But that's > neither here nor there. > > A better way to get the byte count of a file is > > stat --format=%s ...I guess you mean that we should do something like this to get the totals for a directory and contents: $ find miller -print0 | xargs -0 stat --format=%s | awk '{sum+=$1}END{print sum}' 145159848954 OK, that does work, but how horrible is it that I can get exactly the same answer like so: $ du -sb miller 145159848954 miller Of course it's worse if you want to do multiple directories at once. That's a violation of unix philosophy? It isn't true that it has nothing to do with disk management. For example, when moving files between systems, it might help a lot to know the actual size. What if I want to make a .tar file from a directory? How large will that file be? How much space will the files take up on tape? If I'm using tar for tape backup, I think the size will be given by --apparent-size, not by ondisk size. Mike On Fri, 4 Apr 2014, David Wagle wrote: > "apparent size" is the "ls -l" size of the file. > > which is the "rght" size for you to use is dependent on what you're trying > to do. > > Apparent size is nearly useless for managing disks -- which is usually what > you use du for. > > Say my disk has blocks that are 1KB. If I have a file with the nothing but > the letter 'A' in it, that will have an apparent size of 1 byte. But > because the smallest block size on my disk is 1KB, that 1 byte file will > USE 1 KB of disk space no matter what because the physical data has to be > recorded in a block and that block will then be marked 'used.' > > As an aside, imho, the 'apparent size' option is really a terrible option > to include in 'du' and is a violation of the unix philosophy because it has > explicitly NOTHING to do with disk management. But that's neither here nor > there. On Fri, 4 Apr 2014, David Wagle wrote: > "apparent size" is the "ls -l" size of the file. > > which is the "rght" size for you to use is dependent on what you're trying > to do. > > Apparent size is nearly useless for managing disks -- which is usually what > you use du for. > > Say my disk has blocks that are 1KB. If I have a file with the nothing but > the letter 'A' in it, that will have an apparent size of 1 byte. But > because the smallest block size on my disk is 1KB, that 1 byte file will > USE 1 KB of disk space no matter what because the physical data has to be > recorded in a block and that block will then be marked 'used.' > > As an aside, imho, the 'apparent size' option is really a terrible option > to include in 'du' and is a violation of the unix philosophy because it has > explicitly NOTHING to do with disk management. But that's neither here nor > there. > > > On Fri, Apr 4, 2014 at 2:29 PM, Mike Miller <mbmiller+l at gmail.com> wrote: > >> On Tue, 1 Apr 2014, Mike Miller wrote: >> >> On Tue, 1 Apr 2014, Ben wrote: >>> >>> -h will always be different from the actual disk usage, you might also >>>> want to play around with -B option too. >>>> >>> >>> I've done that. Using --si -sB GB gives the same result as --si -sh. Did >>> you think that they would be different? >>> >> >> Thanks for the suggestions. Now I have answers (below). >> >> I was misusing the --si option there. It should be used *instead* of -h, >> not in conjunction with it. These two commands should do the same thing >> when the volume in "dir" is in the multi-gigabyte range... >> >> du -s --si dir >> du -sB GB dir >> >> ...and so should these two commands: >> >> du -sh dir >> du -sB G dir >> >> The first pair will report 1000*1000*1000 bytes and the second will report >> 1024*1024*1024 bytes. >> >> >> >> What happens when you use --apparent-size option. >>>> --apparent-size >>>> print apparent sizes, rather than disk usage; although the >>>> apparent size is usually smaller, it may be larger due to holes >>>> in ('sparse') files, internal fragmentation, indirect blocks, >>>> and the like >>>> >>> >>> I want to try that, but I'm having this problem right now: >>> >>> $ ls /project/guanwh >>> ls: cannot access /project/guanwh: Stale file handle >>> >> >> Yep, you nailed it. That was the issue. If I use --apparent-size, the >> results are consistent. According to supercomputing staff: >> >> "it is not a bug, -b is implies --apparent-size, so to compare its output >> to -sm/sh you have to include --apparent-size with -sm/-sh as well. >> >> "when the apparent size is different from the reported size it is not a >> bug in du but rather a feature of the filesystem :)" >> >> Now I just have to figure out which is the right size for me -- apparent >> or reported. I guess apparent sizes are the real file sizes. In this >> example "dir" has about 10,000 files in it with about half being 5 KB and >> have about 29 MB: >> >> $ du -s --si dir >> 162G dir >> >> $ du -s --si --apparent-size dir >> 143G dir >> >> $ du -sb dir >> 142038799951 dir >> >> $ wc -c dir/* | tail -1 >> 142037349967 total >> >> >> One thing to note: It seems that du always rounds up. So if 1.1 GB are >> used, du will report 2 GB. >> >> >> Mike >> _______________________________________________ >> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota >> tclug-list at mn-linux.org >> http://mailman.mn-linux.org/mailman/listinfo/tclug-list >> >