Difference between revisions of "Guide to lab computing"

From Phonlab
Jump to navigationJump to search
Line 5: Line 5:
 
The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers.
 
The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers.
   
Here's how to get started:
+
Here's how to get started with environments with Anaconda Python:
   
# Install [https://docs.conda.io/en/latest/miniconda.html miniconda]. This gives you the Anaconda base tools and nothing else (not even Python).
+
# Install [https://docs.conda.io/en/latest/miniconda.html miniconda] instead of using the full Anaconda installer. This gives you the Anaconda base tools and nothing else (not even Python).
  +
# (Optional and recommended) [https://conda-forge.org/docs/user/introduction.html#how-can-i-install-packages-from-conda-forge Make <code>conda-forge</code> your default package channel]. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list.
# Use the <code>conda</code> command to create environments and install packages. This is a lot better than pip because it does a better job of handling dependencies for you, plus it makes it easier to duplicate your work.
 
  +
## Set <code>conda-forge</code> as the highest priority channel: <code>conda config --add channels conda-forge</code>
 
  +
## Activate <code>strict</code> channel priority: <code>conda config --set channel_priority strict</code>
After installing, create one or more environments in which you install Python, Jupyter, and whatever additional packages you need. It's best to have a minimal base environment and do all of your work in a non-base environment. You might find that you can install everything you need cleanly in a single environment, but it's easy to create additional environments if you need to. This is useful to avoid package conflicts, or if you want to make your workflow for a single project easily repeatable and shared. Just create a separate environment for each project and activate it for the scripts related to the project. Then you can dump the environment specification and share the package list with anyone to duplicate. So if you have an environment that you want Amber to have, you can have her build an identical environment (not including software installed with pip or by hand).
+
# Create an environment for your project. This is where you install Python, Jupyter, and whatever additional packages you need. It's best to have a minimal base environment and do all of your work in a non-base environment. You might find that you can install everything you need cleanly in a single environment, but it's easy to create additional environments if you need to. This is useful to avoid package conflicts, or if you want to make your workflow for a single project easily repeatable and shared.
   
 
Whenever possible, install packages using conda. If you need something that is not available in the default channel, the conda-forge channel is a good source of additional (and more up-to-date) scientific software. Fall back to pip only in the event you can't find a conda package. As a last resort you can download and install some software by hand, e.g. audiolabel, which has no alternative installation method.
 
Whenever possible, install packages using conda. If you need something that is not available in the default channel, the conda-forge channel is a good source of additional (and more up-to-date) scientific software. Fall back to pip only in the event you can't find a conda package. As a last resort you can download and install some software by hand, e.g. audiolabel, which has no alternative installation method.

Revision as of 16:45, 11 April 2022

Reproducible Python environments

If you work with Python (or any programming language) over an extended period of time you will find that your old projects no longer work in the same environment as your newer projects. The language itself evolves over time, as do the library dependencies you import into your projects.

The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers.

Here's how to get started with environments with Anaconda Python:

  1. Install miniconda instead of using the full Anaconda installer. This gives you the Anaconda base tools and nothing else (not even Python).
  2. (Optional and recommended) Make conda-forge your default package channel. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list.
    1. Set conda-forge as the highest priority channel: conda config --add channels conda-forge
    2. Activate strict channel priority: conda config --set channel_priority strict
  3. Create an environment for your project. This is where you install Python, Jupyter, and whatever additional packages you need. It's best to have a minimal base environment and do all of your work in a non-base environment. You might find that you can install everything you need cleanly in a single environment, but it's easy to create additional environments if you need to. This is useful to avoid package conflicts, or if you want to make your workflow for a single project easily repeatable and shared.

Whenever possible, install packages using conda. If you need something that is not available in the default channel, the conda-forge channel is a good source of additional (and more up-to-date) scientific software. Fall back to pip only in the event you can't find a conda package. As a last resort you can download and install some software by hand, e.g. audiolabel, which has no alternative installation method.

Printing

The Lab printer is a Xerox Phaser 3250 and is located in room 50.

For troubleshooting see the printer manual.

The Berkeley Phonetics Machine

The Berkeley Phonetics Machine is a virtual machine with phonetic software preinstalled.

Sample scripts and snippets

  • get_dur -- a very simple script for reading label durations from a Praat textgrid
  • output formatting in Python -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
  • ffmpeg reference -- a reference page describing scriptable ways to use ffmpeg for creating video stimuli
  • multi_align -- a script for running pyalign on an audio file based on labelled regions of a textgrid. Pull it into the BPM with 'sudo bpm-update ucblingmisc'.

Tools and libraries

The tools and libraries listed here are available on the department server. Some may also be available for other platforms.

Local tools

  • ifcformant -- a command line tool for extracting formant measurements, as described in Ueda, Yuichi; Hamakawa, Tomoya; Sakata, Tadashi; Hario, Syota Hario; Watanabe, Akira (2007) A real-time formant tracker based on the inverse filter control method, Acoustical Science and Technology of the Acoustical Science of Japan 28(4), 271-4. We are grateful to Yuichi Ueda for providing the C code which implements the algorithm. The user interface is provided by a Python wrapper around the authors' C code and was written by Ronald Sprouse.

    Lab members can contact Ronald Sprouse for copies of ifcformant compiled for OS X, Windows, or Linux systems. Unfortunately we do not have permission to distribute the C code or compiled versions of this tool to the public.

    For detailed usage information, run: ifcformant --help
  • convertlabel -- a command line tool for converting between Praat textgrids, ESPS label files, and Wavesurfer label files. You can also scale or shift timepoints in the label file by a specified amount. Written by Ronald Sprouse.

    For detailed usage information, run: convertlabel --help
  • ultracomm -- a command line tool for configuring and acquiring ultrasound data from an Ultrasonix Tablet system.
  • ultrasession.py -- a Python script for running ultracomm and simultaneously acquiring audio and ultrasound synchronization signals.

Local libraries

  • audiolabel -- a Python library for reading and writing Praat textgrids, ESPS label files, Wavesurfer label files, and time-aligned tabular data. Special access methods for retrieving labels at specified times or by matching label content. Written by Ronald Sprouse, and available on [github]. See meas_formants for a sample script that uses this library. The audiolabel_demo walks you through many of the steps executed in meas_formants.
  • SoundLabel.pm -- a Perl library for reading and writing Praat textgrids, ESPS label files, and Wavesurfer label files, written by Ronald Sprouse. Old and clunky API. You are encouraged to write scripts that use audiolabel instead.

Handy third-party tools

  • pyalign -- a command line tool for automatically aligning phones to an audio file based on an orthographic transcription of the audio.
  • reaper -- a command line tool for calculating F0 from an audio file