Guide to lab computing

From Phonlab
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Managing reproducible Python environments

If you work with Python (or any programming language) over an extended period of time you will find that your old projects no longer work in the same environment as your newer projects. The language itself evolves over time, as do the library dependencies you import into your projects. As you keep up with the latest changes your older scripts tend to break.

The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers. Keep your old environment definition around for your old project, and use a new environment with updated packages for your new project.

Here's how to get started with environments using Anaconda Python:

  1. Install miniconda instead of using the full Anaconda installer. This creates a minimal base environment that includes the Anaconda base tools.
  2. Open a terminal window where you can run the conda command. On Macs this is usually just a normal Terminal window, and on Windows you'll find Anaconda prompt shortcuts in the 'Anaconda3' program group in your Start Menu.
  3. Ensure that you have access to the git command: conda install git
  4. (Optional and recommended) Make conda-forge your default package channel. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list. You should see (base) as part of the prompt in your terminal.
    1. Set conda-forge as the highest priority channel: conda config --add channels conda-forge
    2. Activate strict channel priority: conda config --set channel_priority strict
  5. Create an environment for your project. This is where you install a specific version of Python, Jupyter, and whatever additional packages you need. It's best to keep a minimal base environment and do all of your work in a project-specific environment. Here are two ways to create your environment:
    1. Create your environment from an existing YAML specification file:
      1. Download or create the spec file. The phonlabenv.yaml file contains a good starting environment for phonetics. (The environment name defaults to phonlab.)
      2. Create the environment: conda env create -f phonlabenv.yaml
    2. Create your environment and install packages manually:
      1. Create your environment: conda create --name myproj (where myproj is the name of your environment).
      2. Install packages into the new environment: conda install --name myproj pkg1 pkg2 (where pkg1 and pkg2 are the names of packages you want to install, e.g. python and jupyter).
  6. Repeat the environment creation step for as many environments as you need.
  7. To activate and use an environment: conda activate myproj. Code executed in an environment should find the specific package versions installed in that environment and not the versions installed in other environments.
  8. To share your environment with a collaborator, export the specification to a file: conda env export --name myproj > myproj.yaml
  9. (Optional and recommended) Add the .yaml file you created in the preceding step to the git repo you created for your project. This spec file can be useful if you want to integrate executable notebooks into your repo. For example, see the phonapps repo and its environment.yml file that defines an enivronment to be used on Binder.


Whenever possible, install packages using conda. Fall back to pip only in the event you can't find a conda package.

Printing

The Lab printer is a Xerox Phaser 3250 and is located in room 50.

For troubleshooting see the printer manual.

The Berkeley Phonetics Machine

The Berkeley Phonetics Machine is a virtual machine with phonetic software preinstalled.

Sample scripts and snippets

  • get_dur -- a very simple script for reading label durations from a Praat textgrid
  • output formatting in Python -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
  • ffmpeg reference -- a reference page describing scriptable ways to use ffmpeg for creating video stimuli
  • multi_align -- a script for running pyalign on an audio file based on labelled regions of a textgrid. Pull it into the BPM with 'sudo bpm-update ucblingmisc'.

Tools and libraries

The tools and libraries listed here are available on the department server. Some may also be available for other platforms.

Local tools

  • ifcformant -- a command line tool for extracting formant measurements, as described in Ueda, Yuichi; Hamakawa, Tomoya; Sakata, Tadashi; Hario, Syota Hario; Watanabe, Akira (2007) A real-time formant tracker based on the inverse filter control method, Acoustical Science and Technology of the Acoustical Science of Japan 28(4), 271-4. We are grateful to Yuichi Ueda for providing the C code which implements the algorithm. The user interface is provided by a Python wrapper around the authors' C code and was written by Ronald Sprouse.

    Lab members can contact Ronald Sprouse for copies of ifcformant compiled for OS X, Windows, or Linux systems. Unfortunately we do not have permission to distribute the C code or compiled versions of this tool to the public.

    For detailed usage information, run: ifcformant --help
  • convertlabel -- a command line tool for converting between Praat textgrids, ESPS label files, and Wavesurfer label files. You can also scale or shift timepoints in the label file by a specified amount. Written by Ronald Sprouse.

    For detailed usage information, run: convertlabel --help
  • ultracomm -- a command line tool for configuring and acquiring ultrasound data from an Ultrasonix Tablet system.
  • ultrasession.py -- a Python script for running ultracomm and simultaneously acquiring audio and ultrasound synchronization signals.

Local libraries

  • audiolabel -- a Python library for reading and writing Praat textgrids, ESPS label files, Wavesurfer label files, and time-aligned tabular data. Special access methods for retrieving labels at specified times or by matching label content. Written by Ronald Sprouse, and available on [github]. See meas_formants for a sample script that uses this library. The audiolabel_demo walks you through many of the steps executed in meas_formants.
  • SoundLabel.pm -- a Perl library for reading and writing Praat textgrids, ESPS label files, and Wavesurfer label files, written by Ronald Sprouse. Old and clunky API. You are encouraged to write scripts that use audiolabel instead.

Handy third-party tools

  • pyalign -- a command line tool for automatically aligning phones to an audio file based on an orthographic transcription of the audio.
  • reaper -- a command line tool for calculating F0 from an audio file