On Thu, Oct 14, 2010 at 04:35:10PM -0500, Mike Miller wrote: > The csplit coreutil program lets me split a file into sections based on > some delimiter. What I really want to do is split a file into sections > based on a delimiter but forcing those sections to be at least b bytes in > size, even if that means including multiple delimiters in most or all > sections. > > An example would be that I have an mbox file (email messages) of 300 MB > and containing 50,000 messages and I want to break it into 10 sections of > at least 30 MB each (the tenth section would have to be a little smaller > because there wouldn't be enough file left). > > I can do stuff like this to divide the file "mbox" into individual email > messages, one per file... > > csplit -ksz mbox '/^From /' {*} I don't have an answer to your general question, but in this particular instance csplit would not necessarily do what you want, as there might be a paragraph starting with 'From' at the beginning of the line (which vim e-mail syntax highlighting merrily bolds and colors) that would result in a message split in two. Use 'formail' for this kind of processing. > ...but I can't figure out how to make the files bigger so that they > include multiple delimiters. > > It seems like there ought to be a way to do this. You could be 'catting' together bunches of smaller files 8^) Cheers, florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20101014/2808e263/attachment.pgp