For searching: Split E-Mail mbox with formail limiting output mbox size. This looked intriguing and I didn't quickly find the answer on the web. I created maxmail.sh containing this script: #! /bin/sh -f prefix=splitmbox # 10MB maxsize=10000000 #check that the count file exist. Make one if it doesn't. if [ ! -f count ] ; then echo 1 > count fi # set a variable to the contents of the count file count=$(cat count) # create a splitmbox file if it doesn't exist if [ ! -f $prefix.$count ] ; then touch $prefix.$count fi #check the size of that box size=`stat -c %s $prefix.$(cat count)` # if it's greater than your max, then increment count if [ $size -gt $maxsize ] ; then count=$(expr $count + 1) echo $count > count echo "Splitting to $prefix.$count" fi # append whatever came into this script to the splitmbox file cat >> $prefix.$count #-----------------End Script Then I ran procmail -s ./maxmail.sh < mbox if you have some really large individual mails, they will stay together and may make your split mbox bigger than your max. My result: Splitting to splitmbox.2 Splitting to splitmbox.3 Splitting to splitmbox.4 Splitting to splitmbox.5 gsker at veeta:~/mail> ls -l splitmbox.* -rw-rw-r-- 1 gsker gsker 10006881 2010-10-14 19:44 splitmbox.1 -rw-rw-r-- 1 gsker gsker 11950245 2010-10-14 19:45 splitmbox.2 -rw-rw-r-- 1 gsker gsker 12995777 2010-10-14 19:45 splitmbox.3 -rw-rw-r-- 1 gsker gsker 10063591 2010-10-14 19:45 splitmbox.4 -rw-rw-r-- 1 gsker gsker 4328906 2010-10-14 19:45 splitmbox.5 gsker at veeta:~/mail> wc -l splitmbox.* 165330 splitmbox.1 210013 splitmbox.2 200543 splitmbox.3 171013 splitmbox.4 90904 splitmbox.5 837803 total gsker at veeta:~/mail> wc -l mbox 837803 mbox Cool! -- Gerry Skerbitz gsker at skerbitz.org -------------- next part -------------- On Thu, Oct 14, 2010 at 04:35:10PM -0500, Mike Miller wrote: > The csplit coreutil program lets me split a file into sections based on > some delimiter. What I really want to do is split a file into sections > based on a delimiter but forcing those sections to be at least b bytes in > size, even if that means including multiple delimiters in most or all > sections. > > An example would be that I have an mbox file (email messages) of 300 MB > and containing 50,000 messages and I want to break it into 10 sections of > at least 30 MB each (the tenth section would have to be a little smaller > because there wouldn't be enough file left). > > I can do stuff like this to divide the file "mbox" into individual email > messages, one per file... > > csplit -ksz mbox '/^From /' {*} I don't have an answer to your general question, but in this particular instance csplit would not necessarily do what you want, as there might be a paragraph starting with 'From' at the beginning of the line (which vim e-mail syntax highlighting merrily bolds and colors) that would result in a message split in two. Use 'formail' for this kind of processing. > ...but I can't figure out how to make the files bigger so that they > include multiple delimiters. > > It seems like there ought to be a way to do this. You could be 'catting' together bunches of smaller files 8^) Cheers, florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20101014/796a4fbb/attachment.pgp -------------- next part -------------- _______________________________________________ TCLUG Mailing List - Minneapolis/St. Paul, Minnesota tclug-list at mn-linux.org http://mailman.mn-linux.org/mailman/listinfo/tclug-list