Difference between revisions of "Output formatting in Python"

From Phonlab
Jump to navigationJump to search
(Created page with " <nowiki> outputs = ( 'f1', '0.1f', 'f2', '0.2f', 'f3', '0.3f' ) head = '\t'.join(outputs[0:len(outputs):2]) fmt = '\t'.join(outputs[1:len(outputs):2]) </nowiki>")
 
Line 1: Line 1:
  +
Producing properly-formatted output in a script can be a pain, especially if you want to output a large number of variables per record. This Python snippet shows a technique for creating readable output formats. It's more verbose than some other techniques, but it's easy to use and maintain, and it helps make your code self-documenting.
  +
  +
== Format strings ==
  +
  +
There are numerous ways to create a string to print in Python. In this snippet we'll look at format strings, which are the preferred way of interpolating variables into a structured format. We'll look only at the format options necessary to understand the snippet. Consult the Python documentation if you want to know more about format strings.
  +
  +
In our format string we'll use named fields. Let's say that we want to print tab-delimited record containing a vowel label, an index, and f1 and f2 measures for each item in our dataset. We create a header string and a format string that looks like this:
  +
 
<nowiki>
 
<nowiki>
  +
head = "v\tidx\tf1\tf2\n"
outputs = (
 
  +
fmt = "{v:s}\t{idx:d}\t{f1:0.4f}\t{f2:0.3f}\n"
  +
</nowiki>
  +
  +
Note how difficult it is to pick out the fields and their order even in these short header and format strings.
  +
  +
The <code>head</code> line creates a tab-delimited set of field names.
  +
  +
The <code>fmt</code> line creates a format string <code>fmt</code> that contains a list of keyword fields, <code>v</code>, <code>idx</code>, <code>f1</code>, <code>f2</code>. Each field is followed by <code>:</code> and a format specifier that tells Python how to interpolate values into the string. These are basically the same as <code>sprintf</code>-style formatting from C, but no <code>%</code> is used. In this string the formats as <code>s</code> for interpolating an input string, <code>d</code> for interpolating an integer, and <code>0.4f</code> and <code>0.3f</code> for floating point format with four and three digits of precision.
  +
  +
We use the format string by calling the <code>format()</code> method on <code>fmt</code> and supplying values for the keyword fields we defined in our format string.
  +
  +
<nowiki>
  +
>> print fmt.format(idx=3, f1=247.3, v='AA', f2=1589.3651111)
  +
AA 3 247.3000 1589.365
  +
</nowiki>
  +
  +
Note that it is not necessary to supply values in the order in which they appear in the format string because our values are supplied to <code>format()</code> by keyword arguments.
  +
  +
== The snippet ==
  +
  +
Here is the essence of the snippet. Cut and paste this code. Edit <code>fldmap</code> so that each line contains a pairs of column headings and the format for that column.
  +
  +
<nowiki>
  +
# Edit fldmap to contain your pairs of column headings and column formats.
  +
fldmap = (
  +
'v', 's',
  +
'idx', 'd',
 
'f1', '0.1f',
 
'f1', '0.1f',
 
'f2', '0.2f',
 
'f2', '0.2f',
'f3', '0.3f'
+
'f3', '0.10f',
 
)
 
)
   
head = '\t'.join(outputs[0:len(outputs):2])
+
head = '\t'.join(fldmap[0:len(fldmap):2]) + '\n'
fmt = '\t'.join(outputs[1:len(outputs):2])
+
fmt = '\t'.join( \
  +
[ \
  +
'{' + '{0}:{1}'.format(var,fmt) + '}' \
  +
for var, fmt in zip( \
  +
fldmap[0:len(fldmap):2], \
  +
fldmap[1:len(fldmap):2] \
  +
) \
  +
] \
  +
) + '\n'
  +
</nowiki>
  +
  +
<nowiki>
  +
In [10]: head
  +
Out[10]: 'v\tf1\tf2\tf3\n'
  +
  +
In [11]: print head
  +
v f1 f2 f3
  +
</nowiki>
  +
  +
<nowiki>
  +
v = 'AA'
  +
f1 = 200.125
  +
f2_val = 1575.3890045
  +
f = {'third': 2396.7}
  +
</nowiki>
  +
  +
<nowiki>
  +
In [25]: fmt
  +
Out[25]: '{v:s}\t{idx:d}\t{f1:0.1f}\t{f2:0.2f}\t{f3:5.3f}\n'
  +
  +
In [26]: print fmt
  +
{v:s} {idx:d} {f1:0.1f} {f2:0.2f} {f3:0.10f}
  +
  +
  +
In [27]: print fmt.format(f3=f['third'], f1=f1, f2=f2_val, idx=idx, v=v)
  +
AA 2 200.1 1575.39 2396.7000000000
 
</nowiki>
 
</nowiki>

Revision as of 10:53, 15 October 2013

Producing properly-formatted output in a script can be a pain, especially if you want to output a large number of variables per record. This Python snippet shows a technique for creating readable output formats. It's more verbose than some other techniques, but it's easy to use and maintain, and it helps make your code self-documenting.

Format strings

There are numerous ways to create a string to print in Python. In this snippet we'll look at format strings, which are the preferred way of interpolating variables into a structured format. We'll look only at the format options necessary to understand the snippet. Consult the Python documentation if you want to know more about format strings.

In our format string we'll use named fields. Let's say that we want to print tab-delimited record containing a vowel label, an index, and f1 and f2 measures for each item in our dataset. We create a header string and a format string that looks like this:

head = "v\tidx\tf1\tf2\n"
fmt = "{v:s}\t{idx:d}\t{f1:0.4f}\t{f2:0.3f}\n"

Note how difficult it is to pick out the fields and their order even in these short header and format strings.

The head line creates a tab-delimited set of field names.

The fmt line creates a format string fmt that contains a list of keyword fields, v, idx, f1, f2. Each field is followed by : and a format specifier that tells Python how to interpolate values into the string. These are basically the same as sprintf-style formatting from C, but no % is used. In this string the formats as s for interpolating an input string, d for interpolating an integer, and 0.4f and 0.3f for floating point format with four and three digits of precision.

We use the format string by calling the format() method on fmt and supplying values for the keyword fields we defined in our format string.

>> print fmt.format(idx=3, f1=247.3, v='AA', f2=1589.3651111)
AA	3	247.3000	1589.365

Note that it is not necessary to supply values in the order in which they appear in the format string because our values are supplied to format() by keyword arguments.

The snippet

Here is the essence of the snippet. Cut and paste this code. Edit fldmap so that each line contains a pairs of column headings and the format for that column.

# Edit fldmap to contain your pairs of column headings and column formats.
fldmap = (
  'v',  's',
  'idx', 'd',
  'f1', '0.1f',
  'f2', '0.2f',
  'f3', '0.10f',
)

head = '\t'.join(fldmap[0:len(fldmap):2]) + '\n'
fmt  = '\t'.join( \
           [ \
               '{' + '{0}:{1}'.format(var,fmt) + '}' \
               for var, fmt in zip( \
                       fldmap[0:len(fldmap):2], \
                       fldmap[1:len(fldmap):2] \
                   ) \
           ] \
       ) + '\n'

In [10]: head
Out[10]: 'v\tf1\tf2\tf3\n'

In [11]: print head
v	f1	f2	f3

v = 'AA'
f1 = 200.125
f2_val = 1575.3890045
f = {'third': 2396.7}

In [25]: fmt
Out[25]: '{v:s}\t{idx:d}\t{f1:0.1f}\t{f2:0.2f}\t{f3:5.3f}\n'

In [26]: print fmt
{v:s}	{idx:d}	{f1:0.1f}	{f2:0.2f}	{f3:0.10f}


In [27]: print fmt.format(f3=f['third'], f1=f1, f2=f2_val, idx=idx, v=v)
AA	2	200.1	1575.39	2396.7000000000