Difference between revisions of "Guide to lab computing"

From Phonlab
Jump to navigationJump to search
(Created page with "All Lab workstations automatically mount shared disk space as the guest user. This shared space is often referred to as the [[https://corpus.linguistics.berkeley.edu/pdrive/pdriv…")
 
 
(102 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
== Managing reproducible Python environments ==
All Lab workstations automatically mount shared disk space as the guest user. This shared space is often referred to as the [[https://corpus.linguistics.berkeley.edu/pdrive/pdrive.html|PDrive]] since it is mounted as drive letter P: on Windows.
 
  +
  +
If you work with Python (or any programming language) over an extended period of time you will find that your old projects no longer work in the same environment as your newer projects. The language itself evolves over time, as do the library dependencies you import into your projects. As you keep up with the latest changes your older scripts tend to break.
  +
  +
The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers. Keep your old environment definition around for your old project, and use a new environment with updated packages for your new project.
  +
  +
Here's how to get started with [https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html environments using Anaconda Python]:
  +
  +
# Install [https://docs.conda.io/en/latest/miniconda.html miniconda] instead of using the full Anaconda installer. This creates a minimal base environment that includes the Anaconda base tools.
  +
# Open a terminal window where you can run the <code>conda</code> command. On Macs this is usually just a normal Terminal window, and on Windows you'll find <code>Anaconda prompt</code> shortcuts in the 'Anaconda3' program group in your Start Menu.
  +
# Ensure that you have access to the <code>git</code> command: <code>conda install git</code>
  +
# (Optional and recommended) [https://conda-forge.org/docs/user/introduction.html#how-can-i-install-packages-from-conda-forge Make <code>conda-forge</code> your default package channel]. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list. You should see <code>(base)</code> as part of the prompt in your terminal.
  +
## Set <code>conda-forge</code> as the highest priority channel: <code>conda config --add channels conda-forge</code>
  +
## Activate <code>strict</code> channel priority: <code>conda config --set channel_priority strict</code>
  +
# Create an environment for your project. This is where you install a specific version of Python, Jupyter, and whatever additional packages you need. It's best to keep a minimal base environment and do all of your work in a project-specific environment. Here are two ways to create your environment:
  +
## Create your environment from an existing YAML specification file:
  +
### Download or create the spec file. The [https://raw.githubusercontent.com/rsprouse/ucblingmisc/master/python/phonlabenv.yaml phonlabenv.yaml file] contains a good starting environment for phonetics. (The environment name defaults to <code>phonlab</code>.)
  +
### Create the environment: <code>conda env create -f phonlabenv.yaml</code>
  +
## Create your environment and install packages manually:
  +
### Create your environment: <code>conda create --name myproj</code> (where <code>myproj</code> is the name of your environment).
  +
### Install packages into the new environment: <code>conda install --name myproj pkg1 pkg2</code> (where <code>pkg1</code> and <code>pkg2</code> are the names of packages you want to install, e.g. <code>python</code> and <code>jupyter</code>).
  +
# Repeat the environment creation step for as many environments as you need.
  +
# To activate and use an environment: <code>conda activate myproj</code>. Code executed in an environment should find the specific package versions installed in that environment and not the versions installed in other environments.
  +
# To share your environment with a collaborator, export the specification to a file: <code>conda env export --name myproj > myproj.yaml</code>
  +
# (Optional and recommended) Add the <code>.yaml</code> file you created in the preceding step to the git repo you created for your project. This spec file can be useful if you want to integrate executable notebooks into your repo. For example, see the [https://github.com/rsprouse/phonapps phonapps repo] and its <code>environment.yml</code> file that defines an enivronment to be used on [https://mybinder.org Binder].
  +
  +
  +
Whenever possible, install packages using <code>conda</code>. Fall back to <code>pip</code> only in the event you can't find a <code>conda</code> package.
  +
  +
== Printing ==
  +
  +
The Lab printer is a Xerox Phaser 3250 and is located in room 50.
  +
  +
For troubleshooting [https://corpus.linguistics.berkeley.edu/pdrive/Xerox_Phaser_3250_Guide_EN.pdf see the printer manual].
  +
  +
== The Berkeley Phonetics Machine ==
  +
  +
The [[Berkeley Phonetics Machine]] is a virtual machine with phonetic software preinstalled.
  +
  +
== Sample scripts and snippets ==
  +
  +
* [[get_dur]] -- a very simple script for reading label durations from a Praat textgrid
  +
  +
* Python notebooks for [https://github.com/rsprouse/audiolabel/blob/master/doc/working_with_phonetic_dataframes.ipynb reading Praat textgrids] and [https://github.com/rsprouse/phonlab/blob/master/doc/Automated%20formant%20measurements.ipynb performing formant analysis] on vowel tokens
  +
  +
* [[output formatting in Python]] -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
  +
  +
* [[sox in phonetic research|<code>sox</code> cookbook for phonetics]] -- not exactly a script; a page describing scriptable ways to use <code>sox</code> that are useful for phoneticians
  +
  +
* [[ffmpeg reference|<code>ffmpeg</code> reference]] -- a reference page describing scriptable ways to use <code>ffmpeg</code> for creating video stimuli
  +
  +
* [https://raw.githubusercontent.com/rsprouse/ucblingmisc/master/opensesame/simplerec.osexp simplerec.osexp] -- a simple audio recording experiment for OpenSesame
  +
  +
* [https://raw.githubusercontent.com/rsprouse/ucblingmisc/master/python/multi_align multi_align] -- a script for running pyalign on an audio file based on labelled regions of a textgrid. Pull it into the BPM with 'sudo bpm-update ucblingmisc'.
  +
  +
== Tools and libraries ==
  +
  +
The tools and libraries listed here are available on the department server. Some may also be available for other platforms.
  +
  +
=== Local tools ===
  +
  +
* '''[[IFC formant tracker |ifcformant]]''' -- a command line tool for extracting formant measurements, as described in Ueda, Yuichi; Hamakawa, Tomoya; Sakata, Tadashi; Hario, Syota Hario; Watanabe, Akira (2007) A real-time formant tracker based on the inverse filter control method, Acoustical Science and Technology of the Acoustical Science of Japan 28(4), 271-4. We are grateful to Yuichi Ueda for providing the C code which implements the algorithm. The user interface is provided by a Python wrapper around the authors' C code and was written by Ronald Sprouse.<br /><br />Lab members can contact Ronald Sprouse for copies of ifcformant compiled for OS X, Windows, or Linux systems. Unfortunately we do not have permission to distribute the C code or compiled versions of this tool to the public.<br /><br />For detailed usage information, run: <code>ifcformant --help</code>
  +
  +
* '''convertlabel''' -- a command line tool for converting between Praat textgrids, ESPS label files, and Wavesurfer label files. You can also scale or shift timepoints in the label file by a specified amount. Written by Ronald Sprouse.<br /><br />For detailed usage information, run: <code>convertlabel --help</code>
  +
  +
* [[concat_pyalign_textgrids]] -- a command line tool for concatenating Praat TextGrids. Written by Keith Johnson.
  +
  +
* [[Klatt_Synthesizer|Klatt synthesizer]] -- a speech synthesizer originally written by Dennis Klatt.
  +
  +
* '''[[ultracomm|ultracomm]]''' -- a command line tool for configuring and acquiring ultrasound data from an Ultrasonix Tablet system.
  +
  +
* '''[[ultrasession.py|ultrasession.py]]''' -- a Python script for running '''ultracomm''' and simultaneously acquiring audio and ultrasound synchronization signals.
  +
  +
=== Local libraries ===
  +
  +
* [https://github.com/rsprouse/audiolabel '''audiolabel'''] -- a Python library for reading and writing Praat textgrids, ESPS label files, Wavesurfer label files, and time-aligned tabular data. Special access methods for retrieving labels at specified times or by matching label content. Written by Ronald Sprouse, and available on [[https://github.com/rsprouse/audiolabel github]]. See [[meas_formants]] for a sample script that uses this library. The [[audiolabel_demo]] walks you through many of the steps executed in <code>meas_formants</code>.
  +
  +
* '''SoundLabel.pm''' -- a Perl library for reading and writing Praat textgrids, ESPS label files, and Wavesurfer label files, written by Ronald Sprouse. Old and clunky API. You are encouraged to write scripts that use audiolabel instead.
  +
  +
=== Handy third-party tools ===
  +
  +
* [[forced alignment|'''pyalign''']] -- a command line tool for automatically aligning phones to an audio file based on an orthographic transcription of the audio.
  +
  +
* '''ffmpeg''' -- a command line tool for transcoding video and audio. [[ffmpeg reference|See the ffmpeg reference]] page for tips on how to use it.
  +
  +
* [[reaper reference|'''reaper''']] -- a command line tool for calculating F0 from an audio file
  +
  +
* '''sox''' -- 'the Swiss Army knife of sound processing programs'; a command line tool for audio processing. [[sox in phonetic research|See the sox in phonetic research]] page for sample usages.

Latest revision as of 06:52, 21 April 2023

Managing reproducible Python environments

If you work with Python (or any programming language) over an extended period of time you will find that your old projects no longer work in the same environment as your newer projects. The language itself evolves over time, as do the library dependencies you import into your projects. As you keep up with the latest changes your older scripts tend to break.

The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers. Keep your old environment definition around for your old project, and use a new environment with updated packages for your new project.

Here's how to get started with environments using Anaconda Python:

  1. Install miniconda instead of using the full Anaconda installer. This creates a minimal base environment that includes the Anaconda base tools.
  2. Open a terminal window where you can run the conda command. On Macs this is usually just a normal Terminal window, and on Windows you'll find Anaconda prompt shortcuts in the 'Anaconda3' program group in your Start Menu.
  3. Ensure that you have access to the git command: conda install git
  4. (Optional and recommended) Make conda-forge your default package channel. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list. You should see (base) as part of the prompt in your terminal.
    1. Set conda-forge as the highest priority channel: conda config --add channels conda-forge
    2. Activate strict channel priority: conda config --set channel_priority strict
  5. Create an environment for your project. This is where you install a specific version of Python, Jupyter, and whatever additional packages you need. It's best to keep a minimal base environment and do all of your work in a project-specific environment. Here are two ways to create your environment:
    1. Create your environment from an existing YAML specification file:
      1. Download or create the spec file. The phonlabenv.yaml file contains a good starting environment for phonetics. (The environment name defaults to phonlab.)
      2. Create the environment: conda env create -f phonlabenv.yaml
    2. Create your environment and install packages manually:
      1. Create your environment: conda create --name myproj (where myproj is the name of your environment).
      2. Install packages into the new environment: conda install --name myproj pkg1 pkg2 (where pkg1 and pkg2 are the names of packages you want to install, e.g. python and jupyter).
  6. Repeat the environment creation step for as many environments as you need.
  7. To activate and use an environment: conda activate myproj. Code executed in an environment should find the specific package versions installed in that environment and not the versions installed in other environments.
  8. To share your environment with a collaborator, export the specification to a file: conda env export --name myproj > myproj.yaml
  9. (Optional and recommended) Add the .yaml file you created in the preceding step to the git repo you created for your project. This spec file can be useful if you want to integrate executable notebooks into your repo. For example, see the phonapps repo and its environment.yml file that defines an enivronment to be used on Binder.


Whenever possible, install packages using conda. Fall back to pip only in the event you can't find a conda package.

Printing

The Lab printer is a Xerox Phaser 3250 and is located in room 50.

For troubleshooting see the printer manual.

The Berkeley Phonetics Machine

The Berkeley Phonetics Machine is a virtual machine with phonetic software preinstalled.

Sample scripts and snippets

  • get_dur -- a very simple script for reading label durations from a Praat textgrid
  • output formatting in Python -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
  • ffmpeg reference -- a reference page describing scriptable ways to use ffmpeg for creating video stimuli
  • multi_align -- a script for running pyalign on an audio file based on labelled regions of a textgrid. Pull it into the BPM with 'sudo bpm-update ucblingmisc'.

Tools and libraries

The tools and libraries listed here are available on the department server. Some may also be available for other platforms.

Local tools

  • ifcformant -- a command line tool for extracting formant measurements, as described in Ueda, Yuichi; Hamakawa, Tomoya; Sakata, Tadashi; Hario, Syota Hario; Watanabe, Akira (2007) A real-time formant tracker based on the inverse filter control method, Acoustical Science and Technology of the Acoustical Science of Japan 28(4), 271-4. We are grateful to Yuichi Ueda for providing the C code which implements the algorithm. The user interface is provided by a Python wrapper around the authors' C code and was written by Ronald Sprouse.

    Lab members can contact Ronald Sprouse for copies of ifcformant compiled for OS X, Windows, or Linux systems. Unfortunately we do not have permission to distribute the C code or compiled versions of this tool to the public.

    For detailed usage information, run: ifcformant --help
  • convertlabel -- a command line tool for converting between Praat textgrids, ESPS label files, and Wavesurfer label files. You can also scale or shift timepoints in the label file by a specified amount. Written by Ronald Sprouse.

    For detailed usage information, run: convertlabel --help
  • ultracomm -- a command line tool for configuring and acquiring ultrasound data from an Ultrasonix Tablet system.
  • ultrasession.py -- a Python script for running ultracomm and simultaneously acquiring audio and ultrasound synchronization signals.

Local libraries

  • audiolabel -- a Python library for reading and writing Praat textgrids, ESPS label files, Wavesurfer label files, and time-aligned tabular data. Special access methods for retrieving labels at specified times or by matching label content. Written by Ronald Sprouse, and available on [github]. See meas_formants for a sample script that uses this library. The audiolabel_demo walks you through many of the steps executed in meas_formants.
  • SoundLabel.pm -- a Perl library for reading and writing Praat textgrids, ESPS label files, and Wavesurfer label files, written by Ronald Sprouse. Old and clunky API. You are encouraged to write scripts that use audiolabel instead.

Handy third-party tools

  • pyalign -- a command line tool for automatically aligning phones to an audio file based on an orthographic transcription of the audio.
  • reaper -- a command line tool for calculating F0 from an audio file