24

I'm composing a plain text (.txt) document that has left-and-right justified paragraphs. I manually add spaces, breaking words with hyphens if necessary, until the lines are uniform length. To use an example from my document, a paragraph like this:

These phantasmic balls have some strange properties, some neat, some
interesting, and others just indicative of lazy design. They're almost all
caused by the same design flaw: The game treats them too much like regular
balls.

will end up looking like this:

These phantasmic balls have some  strange properties,
some neat,  some interesting, and others just indica-
tive of lazy design. They're almost all caused by the
same design flaw:  The game treats them too much like
regular balls.

How can I do this?

It's okay to add spaces (without breaking words over lines), but I need to control the target width of the paragraph (in # characters). The example above is justified to 53 characters per line, but line width in my actual document varies from 60 to 79.

It'd also be great if there was a text editor with such a feature.

I already tried:

  • OpenOffice and TeX: They just use formatting to change how the text is displayed.
  • nano's Justify command doesn't count because it doesn't actually make the lines uniform width.

I accepted the following answer because it was the one that worked best for me. I found the emacs fill commands the easiest to use among the answers, and I appreciate how it recognizes even non-whitespace indentation (for stuff like # and // comments). However, the other answers all fill their own niches and I recommend others visiting this question to look at them as well.

slhck
  • 235,242
Exp HP
  • 404

8 Answers8

23

you need par (official website, source code, package status on Repology)

e.g. to get it on ubuntu, do:

sudo aptitude install par

to justify text.txt (at the width of 80 chars), saving in newtext.txt:

par j1w80 < text.txt > newtext.txt

to use par in vim or gvim:

:set formatprg=par\ j1w80

then highlight the text you want to format and use the gq command.

for more information, see man par

Devon
  • 160
sml
  • 2,020
8

fmt is the old-school solution

From the man page:

fmt [-cmnps] [-d chars] [-l num] [-t num] [goal [maximum] | -width | -w width] [file ...]

and

The goal length defaults to 65 and the maximum to 10 more than the goal length. Alterna- tively, a single width parameter can be specified either by prepending a hyphen to it or by using -w. For example, fmt -w 72'',fmt -72'', and ``fmt 72 72'' all produce identical output.

5

You can do this in emacs using fill-paragraph or fill-region (fill docs).

You need to pass a numeric prefix argument. The default fill column is 70.

Select the text to fill and do M-3 M-x fill-region.

I set the fill column to 53 for your sample text and got:

These phantasmic balls  have some strange properties,
some   neat,  some   interesting,  and   others  just
indicative of lazy  design. They're almost all caused
by  the same design  flaw: The  game treats  them too
much like regular balls.
Doug Harris
  • 28,397
3

Here's a Perl script called paradj that can do full justification and hyphenation.

Here's a diff showing some changes I made to support adding a left margin:

12c12
< my ($indent, $newline);
---
> my ($indent, $margin, $newline);
15a16
>   "margin:i" => \$margin,
21a23
> $margin = 0 if (!$margin);
149a152
>     print " " x $margin;
187a191,193
>   print "--margin=n (or -m=n or -m n)  Add a left margin of n ";
>   print "spaces\n";
>   print "                                (defaults to 0)\n";

This command:

./paradj.pl -h --width=53 --both --indent=0 inputfile

produces output very close to your example (the extra spaces are in different places).

These  phantasmic balls have some strange properties,
some  neat, some interesting, and others just indica-
tive of lazy design. They're almost all caused by the
same  design flaw: The game treats them too much like
regular balls.
1

You can try the Perl script reformat. It does not justify text with multi-byte characters correctly and does not use hyphenation, though.

You can install it on Debian/Ubuntu, by running sudo apt-get install reformat.

Use it like this:

reformat -w 53 -j < mytext.txt

The justified text will be written to standard output:

These phantasmic balls have  some strange properties,
some  neat,  some   interesting,  and  others    just
indicative of lazy design.  They're almost all caused
by the same  design  flaw: The game  treats  them too
much like regular balls.
Devon
  • 160
Flux
  • 301
1

Vim has a bundled plugin called justify. Check :h usr_25 / :h 25.2.

Basically:

  • :packadd justify (or add in .vimrc as described to load by default.)
  • Select text and etc.
    • press _j
    • :'<,'>Justify
    • :Justify for whole file
    • etc.

It also adds a mapping to ,gq which can be OK to run before justifying.

If one want to run it as a silent process on a file one can add a simple process script, for example something like:

#! /bin/bash -

for opt in "$@"; do case $opt in -h|--help) printf 'Usage: %s FILE [width [write]]\n'
"$(basename $0)" >&2 exit esac done

file=$1 write=

Change to desired default width

declare -i textwidth=78 [ $# -gt 1 ] && textwidth=$2 [ $# -gt 2 ] && write=$3

end='q!'

case $write in w|write|-w|--write) end='wq' ;; esac

if ! [ -t 0 ] || [ "$file" = '-' ]; then file=/dev/stdin fi

Enforce q! on stdin data

[ "$file" = '/dev/stdin' ] && end='q!'

vim --clean -Nnu NONE -T dumb -es
-c"set et tw=$textwidth cpo=Be ft=text|pa justify|norm,gq|,gq|gggqG|_j"
-c"%print|$end"
-- "$file"

(I have added some extra ,gq and gggqG as there has been some issues with text containing multiple consecutive spaces and end of paragraphs ending in one word.)


It does however not break words. One can do that manually and reformat, so it's a middle ground.

These phantasmic balls have some strange  properties,
some neat, some interesting, and others just  indica-
tive of lazy design. They're almost all caused by the
same design flaw: The game treats them too much  like
regular balls.
0

To add some info about the paradj programme from Julius C. Duque mentionned above by Dennis Williamson, I have slightly modified this tool to suit need for bulleted / numbered paragraphs. I enclose the code below.

I tried to contact the original author (J.C. Duque) on CPAN to let him keep trace of those changes but to no avail, his email address is no longer valid.

NB: tested on perl 5.032/64b on Win 10.

#!/usr/local/bin/perl

*******************

* Julius C. Duque *

*******************

=for comment

This version of this tool is dated 2023-04-26 10:07:51.

It is the modified version of the original Julius-C.-Duque's paradj.pl tool I found on internet with the modification code by Denis Williamson, from superuser.com site.

See https://www.cpan.org/authors/id/J/JC/JCDUQUE/paradj-1.3.0.pl for the original 1.30 (2003 November 16) version of this file. MANY MANY thanks to him for this tool.

I added myself some changes I was in need to handle paragraphs with numbered or bulleted first lines such as: 1. Title: bla bla ... bal bla bla...

2. Title: fuh fuh ...
fuh fuh fuh...

to produce: 1. title: bla ... blah blah bla bla ... ... ... blah

Some lines added to implement left margin option according to indications provided by Denis Williamson at: https://superuser.com/questions/171616/how-can-i-justify-plain-text-in-linux

Each such line is flaged w/ comment "# Denis Williamson". Many thanks to him. Gilles Maisonneuve. 2023-04-25 11:59:11.

Added option '--firstlineMargin' to allow any paragraph first lines that starts with a bullet / a paragraph number... to have a reduced margin compared to the margin of the remaining part of the subsquent paragraph. If not set, takes the same value as the $margin. See the POD "for comment" block at end of file to read an example of '--firstlineMargin' usage.

Also added some option-handling "enhancement" such as --help and exit on invalid option. Side effect: '--hyphenate' can no longer be abbreviated into '-h' but must be typed in at least as '-hy'.

Gilles Maisonneuve, 2023-04-25 13:38:31

=cut

use diagnostics; use strict; use warnings; use Getopt::Long; use TeX::Hyphen;

my ($width, $hyphenate, $left, $centered, $right, $both); my ($indent, $margin, $newline); # Denis Williamson my $firstlineMargin; # Gilles Maisonneuve my $help; # Gilles Maisonneuve

Corrected '"help" => $hyphenate' into '"hyphenate" => $hyphenate', from

original code.

Gilles Maisonneuve, 2023-04-25 13:27:13

GetOptions("width=i" => $width, "hyphenate" => $hyphenate, "left" => $left, "centered" => $centered, "margin:i" => $margin, # Denis Williamson "first-line-margin=i" => $firstlineMargin, "right" => $right, "both" => $both, "indent:i" => $indent, "newline" => $newline, "help"=>$help)

Can't use die() due to 'use diagnostics' set by Julius Duque; GM, 23-04-25

or exit( print(STDERR "See '--help' to get help\n") && 9 );

my $hyp = new TeX::Hyphen;

Modified to handle 'help' option, GM:

syntax() if !$width or defined $help; $indent = 0 if (!$indent); $margin = 0 if !$margin; # Denis Williamson

Gilles Maisonneuve

$firstlineMargin = $margin if not defined $firstlineMargin;

local $/ = "";

while (<>) { my @linein = split; printpar(@linein); print "\n" if ($newline); }

sub printpar { my (@par) = @_; my $firstline = 0;

while (@par) { $firstline++; my ($buffer, $word); my ($charcount, $wordlen) = (0, 0); # my $linewidth = $width; # Seems to have be a little oversight in Denis Williamson algorithm: margin # is not counted in line size when folding. Trying to correct it. # Gilles Maisonneuve. 2023-04-26 08:39:58 my $linewidth = $width - $margin;

if ($firstline == 1) {
  $linewidth -= $indent;
  # For 1st lines of each paragraph let's try to handle specific line
  # margin, thus do not handle its margin with the global margin it it has
  # its own value.
  # Gilles Maisonneuve, 2023-04-26 08:42:39
  if ($firstlineMargin != $margin) {  # restore original paragraph width
    $linewidth += $margin;            # for 1st line, if not same margins,
    $linewidth -= $firstlineMargin;   # but remove specific 1st line margin.
  };                                  # GM.
  print &quot; &quot; x $indent;
}

while (($charcount &lt; $linewidth) and (@par)) {
  $word = shift @par;
  $buffer .= $word;
  $wordlen = length($word);
  $charcount += $wordlen;
  $buffer .= &quot; &quot;;
  $charcount++;
}

chop $buffer;
$charcount--;

if ($charcount == $wordlen) {
  $linewidth = $wordlen;
  my ($pos, $pre_word_len) = (0, 0);
  if ($hyphenate) {
    if ($word =~ /^([^a-zA-Z]*)([a-zA-Z-']+)([^a-zA-Z]*)$/) {
      my $pre_word = $1;
      $pre_word_len = length($pre_word);
      my $stripped_word = $2;
      $pos = hyphenate_word($stripped_word, $width);
      $pos = 0 if ($wordlen &lt;= $width);
    }

    if ($pos) {
      $charcount = $pre_word_len + $pos;
      my $post_word = substr $word, $charcount;
      unshift(@par, $post_word);
      $buffer = substr $word, 0, $charcount;
      $buffer .= &quot;-&quot;;
      $charcount++;
    }
  }
}

my $lineout = $buffer;

if ($charcount &gt; $linewidth) {
  my ($pos, $pre_word_len) = (0, 0);
  if ($hyphenate) {
    if ($word =~ /^([^a-zA-Z]*)([a-zA-Z-']+)([^a-zA-Z]*)$/) {
      my $pre_word = $1;
      $pre_word_len = length($pre_word);
      my $stripped_word = $2;
      my $unfilled = $linewidth - $charcount + $wordlen
        - $pre_word_len + 1;

      $pos = hyphenate_word($stripped_word, $unfilled);
    }
  }

  $charcount -= $wordlen;

  if ($pos == 0) {
    $charcount--;
    unshift(@par, $word);
  } else {
    my $post_word = substr $word, ($pre_word_len + $pos);
    unshift(@par, $post_word);
    $charcount = $charcount + $pre_word_len + $pos;
  }

  $lineout = substr $buffer, 0, $charcount;

  if ($pos) {
    $lineout .= &quot;-&quot;;
    $charcount++;
  }
}

my $spaces_to_fill = $linewidth - $charcount;

if ($centered) {
  my $leftfill = int($spaces_to_fill/2);
  print &quot; &quot; x $leftfill;
} elsif ($right) {
  print &quot; &quot; x $spaces_to_fill;
} elsif ($both) {                   # justified left and right column
  my $tempbuf = $lineout;
  my $replacements_made = 0;

  if (@par) {
    my $reps = 1;

    while (length($tempbuf) &lt; $linewidth) {
      last if ($tempbuf !~ /\s/);
      if ($tempbuf =~ /(\S+ {$reps})(\S+)/) {
        $tempbuf =~ s/(\S+ {$reps})(\S+)/$1 $2/;
        $replacements_made++;
        $tempbuf = reverse $tempbuf;
      } else {
        $reps++;
      }
    }
  }; # if (@par)

  if ($replacements_made % 2 == 0) {
    $lineout = $tempbuf;
  } else {
    $lineout = reverse $tempbuf;
  }
}; # } elsif ($both) # justified left and right column
# print &quot; &quot; x $margin;          # Denis Williamson
print &quot; &quot; x $margin if $firstline != 1;   # Gilles Maisonneuve
print &quot; &quot; x $firstlineMargin if $firstline == 1;
if ($firstline == 1) {
  $lineout =~ s/^([^ \t]*)([ \t]*)(.*)/$1 $3/;
  my $firstblanksCount = length($2)-1;
  while ($firstblanksCount) {
    $lineout =~ m{^([^ \t]*)(.*)};
    my $head = $1; my $tail = $2;
    # Add ONE space after (in sequence) each one of 'full stop', 'comma',
    # 'semicolon', and 'colon'.
    $tail =~ s/(.*?[\.\,\;\:] )([^ \t]+.*)/$1 $2/
      # or, after all the above punctuation signs have been &quot;exhausted&quot;,
      # start add space for each single space in the line.
      or $tail =~ s/^([^ ]+ .*?)( )([^ ]*.*)/$1$2 $3/
      or $tail =~ s/([^ ] )([^ ])/$1 $2/
      # or, if everything exhausted or failed, just give up and leave the
      # line with poor formatting... (I'm not good enough in algorithm to
      # know how to solve this kind of cases).
      # Gilles Maisonneuve, 2023-04-26 09:45:34
      or last;
    $lineout = $head . $tail;
    $firstblanksCount--;
  }; # while ($firstblanksCount)
}; # if ($firstline == 1)
print &quot;$lineout\n&quot;;

}; # while (@par) }; # sub printpar

sub hyphenate_word { my ($tword, $unfilled) = @_; my @hyphen_places = $hyp->hyphenate($tword);

if (@hyphen_places) { @hyphen_places = reverse @hyphen_places;

foreach my $places (@hyphen_places) {
  return $places if ($places &lt; $unfilled - 1);
}

}

return 0; }

sub syntax {

Gilles Maisonneuve, 2023-04-25 12:12:03.

my $pgmname=$0;

remove Windows file path to display short name in the help synopsis.

Win file paths are either DISK-LETTER:...\ (computer local) or \... (UNC)

for remote (network disk) files.

$pgmname =~ s/^[A-Z]:\.\|^\\.\//i if $^O eq 'MSWin32';

I assume all other OSes are Unix compliant ! (GM)

$pgmname =~ s{/.*/}{} if $^O ne 'MSWin32';

print "Usage:\n"; print " $pgmname --width=n [options] file1 [file2 file3 ...]\n"; print " cat file1 [file2 file3 ...] | $pgmname --width=n [options]\n\n"; print "Credits:\n"; print " Original author: Julius C. Duque.\n"; print " Margin option: Denis Williamson.\n"; print " First line margin option: Gilles Maisonneuve.\n"; print "\n"; print "Options:\n"; print "--width=n (or -w=n or -w n) Line width is n chars long.\n"; print "--left (or -l) Left-justified (default).\n"; print "--right (or -r) Right-justified.\n"; print "--centered (or -c) Centered.\n"; print "--both (or -b) Both left- and right-justified.\n"; print "--indent=n (or -i=n or -i n) Leave n spaces for initial indention", "\n", ' ' x 32, "(defaults to 0).\n";

Denis Williamson:

print "--margin=n (or -m=n or -m n) Add a left margin of n spaces.\n"; print "--first-line-margin n (-f n) Set each paragraph first line margin.", " Allows\n", ' ' x 32, "to set a smaller margin for bulleted lines.\n"; print "--newline (or -n) Output an empty line between ", "paragraphs.\n"; print "--hyphenate (or -hy) Hyphenate word that doesn't fit on a", "line.\n"; print "--help (or -he) This help text.\n"; exit 0; }

=head1 NAME

paradj - a small Perl script that reformats lines of ASCII text so that the resulting lines are justified in any of the following formats: left-justified (default), right-justified, centered, or both left- and right-justified.

=head1 README

Paragraph Adjuster with Hyphenation (PAwH) is a small Perl script that reformats lines of ASCII text so that the resulting lines are justified in any of the following formats: left-justified (default), right-justified, centered, or both left- and right-justified. PAwH has various switches, most are optional, to control its output. The only mandatory switch is the line width (--width). For PAwH to work properly, input paragraphs must be separated by blank lines.

PAwH is also capable of hyphenating a word that cannot be accommodated on a line.

=head1 DESCRIPTION

Paragraph Adjuster with Hyphenation (PAwH) is a small Perl script that reformats lines of ASCII text so that the resulting lines are justified in any of the following formats: left-justified (default), right-justified, centered, or both left- and right-justified. PAwH has various switches, most are optional, to control its output. The only mandatory switch is the line width (--width). For PAwH to work properly, input paragraphs must be separated by blank lines.

PAwH is also capable of hyphenating a word that cannot be accommodated on a line.

=head1 USAGE

You can use PAwH in any of two ways:

./paradj.pl.pl --width=n [options] file1 [file2 file3 ...]

or

cat file1 [file2 file3 ...] | ./paradj.pl --width=n [options]

where file1, file2, file3, and so on, are the files to be reformatted. There's only one output, though.

=head1 SWITCHES

The available switches are:

--width=n (or -w=n or -w n) Line width is n chars long

--left (or -l) Output is left-justified (default)

--right (or -r) Output is right-justified

--centered (or -c) Output is centered

--both (or -b) Output is both left- and right-justified

--indent=n (or -i=n or -i n) Leave n spaces for initial indention (defaults to 0)

--newline (or -n) Insert blank lines between paragraphs

--hyphenate (or -h) Hyphenate word that doesn't fit on a line

=head1 EXAMPLES

The following command reformats the file, LICENSE, so that the line width is at most 70 characters, both left- and right-justified, with blank lines inserted between consecutive paragraphs, and words that can't fit at the end of lines are hyphenated.

paradj.pl --width=70 --both --newline --hyphenate LICENSE

You can also use the shortened version:

paradj.pl -w=70 -b -n -h LICENSE

If you want to indent each paragraph, just use the --indent switch. Say, you want to indent the LICENSE file with 4 leading spaces, type:

paradj.pl --width=70 --both --newline --hyphenate --indent=4 LICENSE

or

paradj.pl -w=70 -b -n -h -i=4 LICENSE

=head1 GUI VERSION

There is also a Perl/Tk version of paradj.pl, called paradj-tk.pl.

=head1 PREREQUISITE

You need Jan Pazdziora's Perl module, TeX::Hyphen, available from the Comprehensive Perl Archive Network (CPAN), to use the hyphenation feature. The latest is version 0.140.

For Windows users, you can install TeX::Hyphen by following these steps:

  1. Uncompress the TeX::Hyphen module, TeX-Hyphen-0.140.tar.gz.

  2. Descend (cd) into the TeX-Hyphen-0.140/lib and copy the TeX directory

into <Perl directory>\lib. For example, if your Perl binaries are installed on E:\Perl, copy the TeX directory into E:\Perl\lib.

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2003 Julius C. Duque <{jcduque}{at}{lycos}{dot}{com}>

This library is free software; you can redistribute it and/or modify it under the same terms as the GNU General Public License.

=pod SCRIPT CATEGORIES

CPAN/Administrative Fun/Educational

=cut

=for comment

An example of usage with a simple text file: lets assume that file.txt contains the following text (here placed between >>> and <<< markers): >>> This paragraph has no bullet nor indexed paragraph title at its left margin, therefore it should remain unchanged beside its regular reformat fill process. The First Line margin should be the same as the regular paragraph margin and no "first blank" counts should interfere.

  1. no expand, "flat" but separate, numbered pages, unique and incrementally

numbered chapters + paragraphs.

  1. Split documentation in multiple pages: when reading a subject, a page that

extend on more than 2 / 3 screen height tends to be difficult to read.

  1. Insert a Table of Contents at top of page.

  2. When using sub-pages (large doc): use a multi-page "table of contents"

which is the Confluence macro named: "Children Display". <<<

the following commands will produce the ensuing results:

$ paradj --width=78 --both --margin=3 --first=0 --newline file.txt >>> This paragraph has no bullet nor indexed paragraph title at its left margin, therefore it should remain unchanged beside its regular reformat fill process. The First Line margin should be the same as the regular paragraph margin and no "first blank" counts should interfere.

  1. no expand, "flat" but separate, numbered pages, unique and incrementally numbered chapters + paragraphs.

  2. Split documentation in multiple pages: when reading a subject, a page that extend on more than 2 / 3 screen height tends to be difficult to read.

  3. Insert a Table of Contents at top of page.

  4. When using sub-pages (large doc): use a multi-page "table of contents" which is the Confluence macro named: "Children Display".

<<<

$ paradj --width=78 --both --margin=3 --newline file.txt >>> This paragraph has no bullet nor indexed paragraph title at its left margin, therefore it should remain unchanged beside its regular reformat fill process. The First Line margin should be the same as the regular paragraph margin and no "first blank" counts should interfere.

  1. no expand, "flat" but separate, numbered pages, unique and incrementally

numbered chapters + paragraphs.

  1. Split documentation in multiple pages: when reading a subject, a page

that extend on more than 2 / 3 screen height tends to be difficult to read.

  1. Insert a Table of Contents at top of page.

  2. When using sub-pages (large doc): use a multi-page "table of contents"

which is the Confluence macro named: "Children Display". <<<

=cut

0

I use Nano.

Rename the file to a shorter name (8.3). Open in Nano, then Alt-J to justify the whole document. Alt-O to save. Then Alt-X to exit. Then rename the file back to the original name.

Might sound cumbersome but it saved me a lot of time over doing the manual Enter, Delete, End to limit the columns to 90. Nano defaults to 72. I can live with that.

Jawa
  • 3,679
Daniel
  • 1