Thanks, David. I thought that was the issue -- that apparent size would
not include overhead, so I was not able to understand why I was getting
apparent size that was smaller than ondisk size. After they moved my data
to a different array, that difference reversed direction. This was
explained to me last night:
"on the old project spaces, zfs did some compression on the data so the
apparent-size was larger than the ondisk size."
So, compression is also an issue, and I wouldn't have thought of that.
Now that there is no compression, I see that ondisk usage is 20GB more
than apparent size:
$ \du -sB GB --apparent-size miller
146GB miller
$ \du -sB GB miller
166GB miller
$ find miller | wc -l
9908
So there are about 2 million bytes of overhead per file, which seems like
a lot, to me. I would think that implies disk blocks of multiple
megabytes, which seems unlikely. There must be more that I don't
understand.
Regarding your idea (David)...
> As an aside, imho, the 'apparent size' option is really a terrible
> option to include in 'du' and is a violation of the unix philosophy
> because it has explicitly NOTHING to do with disk management. But that's
> neither here nor there.
>
> A better way to get the byte count of a file is
>
> stat --format=%s
...I guess you mean that we should do something like this to get the
totals for a directory and contents:
$ find miller -print0 | xargs -0 stat --format=%s | awk '{sum+=$1}END{print sum}'
145159848954
OK, that does work, but how horrible is it that I can get exactly the same
answer like so:
$ du -sb miller
145159848954 miller
Of course it's worse if you want to do multiple directories at once.
That's a violation of unix philosophy? It isn't true that it has nothing
to do with disk management. For example, when moving files between
systems, it might help a lot to know the actual size. What if I want to
make a .tar file from a directory? How large will that file be? How much
space will the files take up on tape? If I'm using tar for tape backup, I
think the size will be given by --apparent-size, not by ondisk size.
Mike
On Fri, 4 Apr 2014, David Wagle wrote:
> "apparent size" is the "ls -l" size of the file.
>
> which is the "rght" size for you to use is dependent on what you're trying
> to do.
>
> Apparent size is nearly useless for managing disks -- which is usually what
> you use du for.
>
> Say my disk has blocks that are 1KB. If I have a file with the nothing but
> the letter 'A' in it, that will have an apparent size of 1 byte. But
> because the smallest block size on my disk is 1KB, that 1 byte file will
> USE 1 KB of disk space no matter what because the physical data has to be
> recorded in a block and that block will then be marked 'used.'
>
> As an aside, imho, the 'apparent size' option is really a terrible option
> to include in 'du' and is a violation of the unix philosophy because it has
> explicitly NOTHING to do with disk management. But that's neither here nor
> there.
On Fri, 4 Apr 2014, David Wagle wrote:
> "apparent size" is the "ls -l" size of the file.
>
> which is the "rght" size for you to use is dependent on what you're trying
> to do.
>
> Apparent size is nearly useless for managing disks -- which is usually what
> you use du for.
>
> Say my disk has blocks that are 1KB. If I have a file with the nothing but
> the letter 'A' in it, that will have an apparent size of 1 byte. But
> because the smallest block size on my disk is 1KB, that 1 byte file will
> USE 1 KB of disk space no matter what because the physical data has to be
> recorded in a block and that block will then be marked 'used.'
>
> As an aside, imho, the 'apparent size' option is really a terrible option
> to include in 'du' and is a violation of the unix philosophy because it has
> explicitly NOTHING to do with disk management. But that's neither here nor
> there.
>
>
> On Fri, Apr 4, 2014 at 2:29 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>
>> On Tue, 1 Apr 2014, Mike Miller wrote:
>>
>> On Tue, 1 Apr 2014, Ben wrote:
>>>
>>> -h will always be different from the actual disk usage, you might also
>>>> want to play around with -B option too.
>>>>
>>>
>>> I've done that. Using --si -sB GB gives the same result as --si -sh. Did
>>> you think that they would be different?
>>>
>>
>> Thanks for the suggestions. Now I have answers (below).
>>
>> I was misusing the --si option there. It should be used *instead* of -h,
>> not in conjunction with it. These two commands should do the same thing
>> when the volume in "dir" is in the multi-gigabyte range...
>>
>> du -s --si dir
>> du -sB GB dir
>>
>> ...and so should these two commands:
>>
>> du -sh dir
>> du -sB G dir
>>
>> The first pair will report 1000*1000*1000 bytes and the second will report
>> 1024*1024*1024 bytes.
>>
>>
>>
>> What happens when you use --apparent-size option.
>>>> --apparent-size
>>>> print apparent sizes, rather than disk usage; although the
>>>> apparent size is usually smaller, it may be larger due to holes
>>>> in ('sparse') files, internal fragmentation, indirect blocks,
>>>> and the like
>>>>
>>>
>>> I want to try that, but I'm having this problem right now:
>>>
>>> $ ls /project/guanwh
>>> ls: cannot access /project/guanwh: Stale file handle
>>>
>>
>> Yep, you nailed it. That was the issue. If I use --apparent-size, the
>> results are consistent. According to supercomputing staff:
>>
>> "it is not a bug, -b is implies --apparent-size, so to compare its output
>> to -sm/sh you have to include --apparent-size with -sm/-sh as well.
>>
>> "when the apparent size is different from the reported size it is not a
>> bug in du but rather a feature of the filesystem :)"
>>
>> Now I just have to figure out which is the right size for me -- apparent
>> or reported. I guess apparent sizes are the real file sizes. In this
>> example "dir" has about 10,000 files in it with about half being 5 KB and
>> have about 29 MB:
>>
>> $ du -s --si dir
>> 162G dir
>>
>> $ du -s --si --apparent-size dir
>> 143G dir
>>
>> $ du -sb dir
>> 142038799951 dir
>>
>> $ wc -c dir/* | tail -1
>> 142037349967 total
>>
>>
>> One thing to note: It seems that du always rounds up. So if 1.1 GB are
>> used, du will report 2 GB.
>>
>>
>> Mike
>> _______________________________________________
>> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
>> tclug-list at mn-linux.org
>> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
>>
>