Difference between revisions of "Output formatting in Python"

From Phonlab
Jump to navigationJump to search
 
(14 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
head = "v\tidx\tf1\tf2\n"
 
head = "v\tidx\tf1\tf2\n"
 
fmt = "{v:s}\t{idx:d}\t{f1:0.4f}\t{f2:0.3f}\n"</nowiki>
 
fmt = "{v:s}\t{idx:d}\t{f1:0.4f}\t{f2:0.3f}\n"</nowiki>
 
Note how difficult it is to pick out the fields and their order even in these short header and format strings. If we want to add a new field it can be a hassle to edit these strings.
 
   
 
The <code>head</code> line creates a tab-delimited set of field names.
 
The <code>head</code> line creates a tab-delimited set of field names.
   
 
The <code>fmt</code> line creates a format string <code>fmt</code> that contains a list of keyword fields, <code>v</code>, <code>idx</code>, <code>f1</code>, <code>f2</code>. Each field is followed by <code>:</code> and a format specifier that tells Python how to interpolate values into the string. These are basically the same as <code>sprintf</code>-style formatting from C, but no <code>%</code> is used. In <code>fmt</code> the formats are <code>s</code> for interpolating an input string, <code>d</code> for interpolating an integer, and <code>0.4f</code> and <code>0.3f</code> for floating point format with four and three digits of precision.
 
The <code>fmt</code> line creates a format string <code>fmt</code> that contains a list of keyword fields, <code>v</code>, <code>idx</code>, <code>f1</code>, <code>f2</code>. Each field is followed by <code>:</code> and a format specifier that tells Python how to interpolate values into the string. These are basically the same as <code>sprintf</code>-style formatting from C, but no <code>%</code> is used. In <code>fmt</code> the formats are <code>s</code> for interpolating an input string, <code>d</code> for interpolating an integer, and <code>0.4f</code> and <code>0.3f</code> for floating point format with four and three digits of precision.
  +
 
Note how difficult it is to pick out the fields and their order even in these short header and format strings. If we want to add a new field it can also be a hassle to edit these strings and keep them synchronized.
   
 
We use the format string by calling the <code>format()</code> method on <code>fmt</code> and supplying values for the keyword fields we defined in <code>fmt</code>. The <code>>></code> indicates that the print line is executed in a Python interpreter, and the second line is the interpreter output.
 
We use the format string by calling the <code>format()</code> method on <code>fmt</code> and supplying values for the keyword fields we defined in <code>fmt</code>. The <code>>></code> indicates that the print line is executed in a Python interpreter, and the second line is the interpreter output.
Line 23: Line 23:
 
AA 3 247.3000 1589.365</nowiki>
 
AA 3 247.3000 1589.365</nowiki>
   
Note that it is not necessary to supply values in the order in which they appear in the format string because our values are supplied to <code>format()</code> by keyword arguments. This makes it easy to add a new field anywhere in the format string without complicating the call to <code>format()</code>; the new <code>keyword=value</code> argument can simply be added to the end of the list.
+
Note that it is not necessary to supply values in the order in which they appear in <code>fmt</code> because our values are supplied to <code>format()</code> by keyword arguments. This makes it easy to add a new field anywhere in the format string without complicating the call to <code>format()</code>; the new <code>keyword=value</code> argument can simply be added to the end of the list.
   
 
== The snippet ==
 
== The snippet ==
   
  +
This snippet helps you print out an unquoted, tab-delimited table of data like this (no attempt has been made to provide realistic formant values):
As we saw above, coding <code>head</code> and <code>fmt</code> as literal strings makes them hard to read, and it's difficult to keep them in synch, especially when adding new fields. This snippet makes it easy to create <code>head</code> and <code>fmt</code> with a readable data structure. To use it, just cut and paste the code, then edit <code>fldmap</code> so that each line contains a column heading paired with the format for that column. Column names and formats are surrounded by quotation marks and separated by commas. If you are satisfied with unquoted, tab-delimited output records, it is not necessary to change the <code>head =</code> or <code>fmt =</code> lines.
 
  +
 
<nowiki>
  +
v idx f1 f2 f3
  +
AA 2 200.1 1575.39 2396.7000000000
  +
OW 3 220.3 1775.39 2396.7000000000
  +
UW 4 240.2 1975.39 2396.7000000000</nowiki>
  +
  +
It consists of five columns labelled 'v', 'idx', 'f1', 'f2', 'f3', which are formatted as a string, integer, and floating point formats with 1, 2, and 10 digits of precision.
  +
  +
=== The snippet itself ===
  +
 
As we saw above, coding <code>head</code> and <code>fmt</code> as literal strings makes them hard to read, and it's difficult to keep them in synch when adding or deleting fields. This snippet makes it easy to create <code>head</code> and <code>fmt</code> from a readable data structure. To use it, just cut and paste the code, then edit <code>fldmap</code> so that each line contains a column heading paired with the format for that column. Column names and formats are surrounded by quotation marks and separated by commas. If you are satisfied with unquoted, tab-delimited output records, it is not necessary to change the <code>head =</code> or <code>fmt =</code> lines.
   
 
<nowiki>
 
<nowiki>
Line 39: Line 51:
 
)
 
)
   
  +
# Leave these alone for unquoted, tab-delimited record format.
 
head = '\t'.join(fldmap[0:len(fldmap):2]) + '\n'
 
head = '\t'.join(fldmap[0:len(fldmap):2]) + '\n'
 
fmt = '\t'.join( \
 
fmt = '\t'.join( \
Line 49: Line 62:
 
] \
 
] \
 
) + '\n'</nowiki>
 
) + '\n'</nowiki>
 
   
 
=== Snippet output ===
 
=== Snippet output ===
Line 83: Line 95:
 
print head</nowiki>
 
print head</nowiki>
   
Next, set the variables that will be used in the call to <code>format()</code>. Note that the variables can have the same name as the keywords in <code>fmt</code> (<code>v</code>, <code>f1</code>), but they can also have a different name (<code>f2_val</code>) or can be an element or a larger data structure (<code>f['third']</code>).
+
Next, set the variables that will be used in the call to <code>format()</code>. Note that the variables can have the same name as the keywords in <code>fmt</code> (<code>v</code>, <code>f1</code>), a different name (<code>f2_val</code>), or can be an element in a larger data structure (<code>f['third']</code>).
   
Normally your script will loop over some number records, and these variable assignments happen within that loop.
+
Normally your script will loop over some number of records, and these variable assignments happen within that loop.
   
 
<nowiki>
 
<nowiki>
Line 91: Line 103:
 
f1 = 200.125
 
f1 = 200.125
 
f2_val = 1575.3890045
 
f2_val = 1575.3890045
f = {'third': 2396.7}
+
f = {'third': 2396.7}</nowiki>
</nowiki>
 
   
Finally, call <code>format()</code> in a <code>print</code> statement and provide values for the keyword fields in <code>fmt</code>.
+
Finally, call <code>format()</code> in a <code>print</code> statement and provide values for the keyword fields in <code>fmt</code>. Note again that the <code>keyword=value</code> pairs can be provided in any order, which makes it easy to keep your print statement in synch with your format string.
   
 
<nowiki>
 
<nowiki>

Latest revision as of 14:41, 15 October 2013

Producing well-formatted output in a script can be a pain, especially if you want to output a large number of variables per record. This Python snippet shows a technique for creating readable output formats. It's more verbose than some other techniques, but it's easy to use and maintain, and it helps make your code self-documenting.

Format strings

There are numerous ways to create a string to print in Python. In this snippet we'll look at format strings, which are the preferred way of interpolating variables into a structured format. We'll look only at the format options necessary to understand the snippet. Consult the Python documentation if you want to know more about format strings.

In our format string we'll use named fields. Let's say that we want to print a tab-delimited record containing a vowel label, an index, and f1 and f2 measures for each item in our dataset. We create a header string and a format string that looks like this:

head = "v\tidx\tf1\tf2\n"
fmt = "{v:s}\t{idx:d}\t{f1:0.4f}\t{f2:0.3f}\n"

The head line creates a tab-delimited set of field names.

The fmt line creates a format string fmt that contains a list of keyword fields, v, idx, f1, f2. Each field is followed by : and a format specifier that tells Python how to interpolate values into the string. These are basically the same as sprintf-style formatting from C, but no % is used. In fmt the formats are s for interpolating an input string, d for interpolating an integer, and 0.4f and 0.3f for floating point format with four and three digits of precision.

Note how difficult it is to pick out the fields and their order even in these short header and format strings. If we want to add a new field it can also be a hassle to edit these strings and keep them synchronized.

We use the format string by calling the format() method on fmt and supplying values for the keyword fields we defined in fmt. The >> indicates that the print line is executed in a Python interpreter, and the second line is the interpreter output.

>> print fmt.format(idx=3, f1=247.3, v='AA', f2=1589.3651111)
AA	3	247.3000	1589.365

Note that it is not necessary to supply values in the order in which they appear in fmt because our values are supplied to format() by keyword arguments. This makes it easy to add a new field anywhere in the format string without complicating the call to format(); the new keyword=value argument can simply be added to the end of the list.

The snippet

This snippet helps you print out an unquoted, tab-delimited table of data like this (no attempt has been made to provide realistic formant values):

v	idx	f1	f2	f3
AA	2	200.1	1575.39	2396.7000000000
OW	3	220.3	1775.39	2396.7000000000
UW	4	240.2	1975.39	2396.7000000000

It consists of five columns labelled 'v', 'idx', 'f1', 'f2', 'f3', which are formatted as a string, integer, and floating point formats with 1, 2, and 10 digits of precision.

The snippet itself

As we saw above, coding head and fmt as literal strings makes them hard to read, and it's difficult to keep them in synch when adding or deleting fields. This snippet makes it easy to create head and fmt from a readable data structure. To use it, just cut and paste the code, then edit fldmap so that each line contains a column heading paired with the format for that column. Column names and formats are surrounded by quotation marks and separated by commas. If you are satisfied with unquoted, tab-delimited output records, it is not necessary to change the head = or fmt = lines.

# Edit fldmap to contain your pairs of column headings and column formats.
fldmap = (
  'v',  's',
  'idx', 'd',
  'f1', '0.1f',
  'f2', '0.2f',
  'f3', '0.10f',
)

# Leave these alone for unquoted, tab-delimited record format.
head = '\t'.join(fldmap[0:len(fldmap):2]) + '\n'
fmt  = '\t'.join( \
           [ \
               '{' + '{0}:{1}'.format(col,fmt) + '}' \
               for col, fmt in zip( \
                       fldmap[0:len(fldmap):2], \
                       fldmap[1:len(fldmap):2] \
                   ) \
           ] \
       ) + '\n'

Snippet output

The column headings are contained in the odd-numbered indexes of fldmap, and the head = line joins these elements with the tab character and adds a newline. Here is the value of head as show in a Python interpreter.

>> head
'v\tf1\tf2\tf3\n'

>> print head
v	f1	f2	f3


The fmt = line is a bit more complicated and is best read from the inside out. It sequentially pulls out pairs of column headings (odd indexes) and formats (even indexes) and formats them so that they are joined by : and surrounded by {}. These {heading:format} pairs are joined by the tab character, and a newline terminates the string.

Here is the value of fmt.

>> fmt
'{v:s}\t{idx:d}\t{f1:0.1f}\t{f2:0.2f}\t{f3:5.3f}\n'

>> print fmt
{v:s}	{idx:d}	{f1:0.1f}	{f2:0.2f}	{f3:0.10f}

Using the snippet output

Now that head and fmt are created, here is how to use them in a script. First, print the header line with

print head

Next, set the variables that will be used in the call to format(). Note that the variables can have the same name as the keywords in fmt (v, f1), a different name (f2_val), or can be an element in a larger data structure (f['third']).

Normally your script will loop over some number of records, and these variable assignments happen within that loop.

v = 'AA'
f1 = 200.125
f2_val = 1575.3890045
f = {'third': 2396.7}

Finally, call format() in a print statement and provide values for the keyword fields in fmt. Note again that the keyword=value pairs can be provided in any order, which makes it easy to keep your print statement in synch with your format string.

print fmt.format(f3=f['third'], f1=f1, f2=f2_val, idx=idx, v=v)

In a Python interpreter the printed output looks like this:

>> print fmt.format(f3=f['third'], f1=f1, f2=f2_val, idx=idx, v=v)
AA	2	200.1	1575.39	2396.7000000000