Difference between revisions of "Guide to lab computing"

From Phonlab
Jump to navigationJump to search
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
== Managing reproducible Python environments ==
== Storage ==
 
   
  +
If you work with Python (or any programming language) over an extended period of time you will find that your old projects no longer work in the same environment as your newer projects. The language itself evolves over time, as do the library dependencies you import into your projects. As you keep up with the latest changes your older scripts tend to break.
All Lab workstations automatically mount shared disk space as the guest user. This shared space is often referred to as the [https://corpus.linguistics.berkeley.edu/pdrive/pdrive.html PDrive] since it is mounted as drive letter P: on Windows. Files on the PDrive are backed up nightly as part of the server backup.
 
   
  +
The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers. Keep your old environment definition around for your old project, and use a new environment with updated packages for your new project.
Space on the PDrive is limited, and it's not a great location for large files that are part of your active workflow (e.g. video files straight from the camcorder that you are editing). Once you are done with your editing you might copy your compressed presentation files (e.g. .mpeg) to the PDrive.
 
   
  +
Here's how to get started with [https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html environments using Anaconda Python]:
If you want to back up some of these large files that are not on the PDrive you can use [https://corpus.linguistics.berkeley.edu/pdrive/amazon_storage.html Amazon storage (AWS)].
 
   
  +
# Install [https://docs.conda.io/en/latest/miniconda.html miniconda] instead of using the full Anaconda installer. This creates a minimal base environment that includes the Anaconda base tools.
Notes on a preliminary exploration of [[Box]].
 
  +
# Open a terminal window where you can run the <code>conda</code> command. On Macs this is usually just a normal Terminal window, and on Windows you'll find <code>Anaconda prompt</code> shortcuts in the 'Anaconda3' program group in your Start Menu.
  +
# Ensure that you have access to the <code>git</code> command: <code>conda install git</code>
  +
# (Optional and recommended) [https://conda-forge.org/docs/user/introduction.html#how-can-i-install-packages-from-conda-forge Make <code>conda-forge</code> your default package channel]. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list. You should see <code>(base)</code> as part of the prompt in your terminal.
  +
## Set <code>conda-forge</code> as the highest priority channel: <code>conda config --add channels conda-forge</code>
  +
## Activate <code>strict</code> channel priority: <code>conda config --set channel_priority strict</code>
  +
# Create an environment for your project. This is where you install a specific version of Python, Jupyter, and whatever additional packages you need. It's best to keep a minimal base environment and do all of your work in a project-specific environment. Here are two ways to create your environment:
  +
## Create your environment from an existing YAML specification file:
  +
### Download or create the spec file. The [https://raw.githubusercontent.com/rsprouse/ucblingmisc/master/python/phonlabenv.yaml phonlabenv.yaml file] contains a good starting environment for phonetics. (The environment name defaults to <code>phonlab</code>.)
  +
### Create the environment: <code>conda env create -f phonlabenv.yaml</code>
  +
## Create your environment and install packages manually:
  +
### Create your environment: <code>conda create --name myproj</code> (where <code>myproj</code> is the name of your environment).
  +
### Install packages into the new environment: <code>conda install --name myproj pkg1 pkg2</code> (where <code>pkg1</code> and <code>pkg2</code> are the names of packages you want to install, e.g. <code>python</code> and <code>jupyter</code>).
  +
# Repeat the environment creation step for as many environments as you need.
  +
# To activate and use an environment: <code>conda activate myproj</code>. Code executed in an environment should find the specific package versions installed in that environment and not the versions installed in other environments.
  +
# To share your environment with a collaborator, export the specification to a file: <code>conda env export --name myproj > myproj.yaml</code>
  +
# (Optional and recommended) Add the <code>.yaml</code> file you created in the preceding step to the git repo you created for your project. This spec file can be useful if you want to integrate executable notebooks into your repo. For example, see the [https://github.com/rsprouse/phonapps phonapps repo] and its <code>environment.yml</code> file that defines an enivronment to be used on [https://mybinder.org Binder].
  +
  +
  +
Whenever possible, install packages using <code>conda</code>. Fall back to <code>pip</code> only in the event you can't find a <code>conda</code> package.
   
 
== Printing ==
 
== Printing ==
   
  +
The Lab printer is a Xerox Phaser 3250 and is located in room 50.
The Lab printer is a Xerox Phaser 3250 and is located in room 50. Lab workstations are set up to print to this printer automatically. You can choose '2-sided' (default), '1-sided', or '2-sided 2-up' when you print. [https://corpus.linguistics.berkeley.edu/pdrive/lab_printer.html Follow the setup instructions] if you would like to print from your own computer.
 
   
 
For troubleshooting [https://corpus.linguistics.berkeley.edu/pdrive/Xerox_Phaser_3250_Guide_EN.pdf see the printer manual].
 
For troubleshooting [https://corpus.linguistics.berkeley.edu/pdrive/Xerox_Phaser_3250_Guide_EN.pdf see the printer manual].
Line 23: Line 42:
 
* [[get_dur]] -- a very simple script for reading label durations from a Praat textgrid
 
* [[get_dur]] -- a very simple script for reading label durations from a Praat textgrid
   
* [[meas_formants]] -- a Python script for reading a Praat textgrid and performing formant analysis on vowel tokens
+
* Python notebooks for [https://github.com/rsprouse/audiolabel/blob/master/doc/working_with_phonetic_dataframes.ipynb reading Praat textgrids] and [https://github.com/rsprouse/phonlab/blob/master/doc/Automated%20formant%20measurements.ipynb performing formant analysis] on vowel tokens
   
 
* [[output formatting in Python]] -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
 
* [[output formatting in Python]] -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
Line 30: Line 49:
   
 
* [[ffmpeg reference|<code>ffmpeg</code> reference]] -- a reference page describing scriptable ways to use <code>ffmpeg</code> for creating video stimuli
 
* [[ffmpeg reference|<code>ffmpeg</code> reference]] -- a reference page describing scriptable ways to use <code>ffmpeg</code> for creating video stimuli
  +
  +
* [https://raw.githubusercontent.com/rsprouse/ucblingmisc/master/opensesame/simplerec.osexp simplerec.osexp] -- a simple audio recording experiment for OpenSesame
  +
  +
* [https://raw.githubusercontent.com/rsprouse/ucblingmisc/master/python/multi_align multi_align] -- a script for running pyalign on an audio file based on labelled regions of a textgrid. Pull it into the BPM with 'sudo bpm-update ucblingmisc'.
   
 
== Tools and libraries ==
 
== Tools and libraries ==
Line 60: Line 83:
   
 
* '''ffmpeg''' -- a command line tool for transcoding video and audio. [[ffmpeg reference|See the ffmpeg reference]] page for tips on how to use it.
 
* '''ffmpeg''' -- a command line tool for transcoding video and audio. [[ffmpeg reference|See the ffmpeg reference]] page for tips on how to use it.
  +
  +
* [[reaper reference|'''reaper''']] -- a command line tool for calculating F0 from an audio file
   
 
* '''sox''' -- 'the Swiss Army knife of sound processing programs'; a command line tool for audio processing. [[sox in phonetic research|See the sox in phonetic research]] page for sample usages.
 
* '''sox''' -- 'the Swiss Army knife of sound processing programs'; a command line tool for audio processing. [[sox in phonetic research|See the sox in phonetic research]] page for sample usages.
 
== Running Matlab ==
 
 
This section describes various ways to run Matlab on the department server:
 
 
=== As a terminal-based application ===
 
 
To run matlab as a '''''terminal-based application''''' (also with no splash screen at startup), use:
 
 
<code>matlab -nodesktop -nosplash</code>
 
 
This runs the default matlab installation, Optionally, you can run a specific installed version:
 
 
<code>/opt/matlab/2010a/bin/matlab -nodesktop -nosplash</code>
 
 
=== As an X11 client ===
 
 
To run matlab as an '''''X11 client''''':
 
 
Mathworks does not officially support running Matlab as an X11 client from a remote machine, and currently this technique works only for the 2010a version.
 
 
# Start up your local X Server.
 
#* For Windows [http://software-central.berkeley.edu get and install Exceed]. Start the Exceed program.
 
#* For Mac, [http://developer.apple.com/opensource/tools/x11.html see Apple's documentation on how to install and run X11] for your version of OS X.
 
# Connect to the server with X tunneling enabled.
 
#* For Windows, use [http://www.chiark.greenend.org.uk/~sgtatham/putty/ putty]. When connecting with putty, under 'Putty configuration', make sure to select SSH, X11, Enable X11 forwarding.
 
#* For Mac, connect using ssh's -X switch, e.g.<br /><code>ssh -X username@linguistics.berkeley.edu</code>
 
# Check to make sure that X tunneling is enabled in your ssh session by checking the value of the <code>$DISPLAY</code> environment variable, which should return something like <code>localhost:10.0</code>. To check, give the command:<br /><code>echo $DISPLAY</code>
 
# Run matlab:<br /><code>matlab</code><br />Currently, the default matlab version isn't working correctly as an X11 client, and you may need to use the 2010a version, which still works:<br /><code>/opt/matlab/2010a/bin/matlab</code>
 

Latest revision as of 07:52, 21 April 2023

Managing reproducible Python environments

If you work with Python (or any programming language) over an extended period of time you will find that your old projects no longer work in the same environment as your newer projects. The language itself evolves over time, as do the library dependencies you import into your projects. As you keep up with the latest changes your older scripts tend to break.

The solution to this problem is to create independent environments for your projects. Doing so helps to ensure the long-term reproducibility of your code, and it also makes it easier to collaborate with other researchers. Keep your old environment definition around for your old project, and use a new environment with updated packages for your new project.

Here's how to get started with environments using Anaconda Python:

  1. Install miniconda instead of using the full Anaconda installer. This creates a minimal base environment that includes the Anaconda base tools.
  2. Open a terminal window where you can run the conda command. On Macs this is usually just a normal Terminal window, and on Windows you'll find Anaconda prompt shortcuts in the 'Anaconda3' program group in your Start Menu.
  3. Ensure that you have access to the git command: conda install git
  4. (Optional and recommended) Make conda-forge your default package channel. This channel is a community-created source of many useful packages that tends to be a little more comprehensive and up-to-date than Anaconda's default package list. You should see (base) as part of the prompt in your terminal.
    1. Set conda-forge as the highest priority channel: conda config --add channels conda-forge
    2. Activate strict channel priority: conda config --set channel_priority strict
  5. Create an environment for your project. This is where you install a specific version of Python, Jupyter, and whatever additional packages you need. It's best to keep a minimal base environment and do all of your work in a project-specific environment. Here are two ways to create your environment:
    1. Create your environment from an existing YAML specification file:
      1. Download or create the spec file. The phonlabenv.yaml file contains a good starting environment for phonetics. (The environment name defaults to phonlab.)
      2. Create the environment: conda env create -f phonlabenv.yaml
    2. Create your environment and install packages manually:
      1. Create your environment: conda create --name myproj (where myproj is the name of your environment).
      2. Install packages into the new environment: conda install --name myproj pkg1 pkg2 (where pkg1 and pkg2 are the names of packages you want to install, e.g. python and jupyter).
  6. Repeat the environment creation step for as many environments as you need.
  7. To activate and use an environment: conda activate myproj. Code executed in an environment should find the specific package versions installed in that environment and not the versions installed in other environments.
  8. To share your environment with a collaborator, export the specification to a file: conda env export --name myproj > myproj.yaml
  9. (Optional and recommended) Add the .yaml file you created in the preceding step to the git repo you created for your project. This spec file can be useful if you want to integrate executable notebooks into your repo. For example, see the phonapps repo and its environment.yml file that defines an enivronment to be used on Binder.


Whenever possible, install packages using conda. Fall back to pip only in the event you can't find a conda package.

Printing

The Lab printer is a Xerox Phaser 3250 and is located in room 50.

For troubleshooting see the printer manual.

The Berkeley Phonetics Machine

The Berkeley Phonetics Machine is a virtual machine with phonetic software preinstalled.

Sample scripts and snippets

  • get_dur -- a very simple script for reading label durations from a Praat textgrid
  • output formatting in Python -- a Python snippet for creating readable and maintainable output format and header strings in your scripts
  • ffmpeg reference -- a reference page describing scriptable ways to use ffmpeg for creating video stimuli
  • multi_align -- a script for running pyalign on an audio file based on labelled regions of a textgrid. Pull it into the BPM with 'sudo bpm-update ucblingmisc'.

Tools and libraries

The tools and libraries listed here are available on the department server. Some may also be available for other platforms.

Local tools

  • ifcformant -- a command line tool for extracting formant measurements, as described in Ueda, Yuichi; Hamakawa, Tomoya; Sakata, Tadashi; Hario, Syota Hario; Watanabe, Akira (2007) A real-time formant tracker based on the inverse filter control method, Acoustical Science and Technology of the Acoustical Science of Japan 28(4), 271-4. We are grateful to Yuichi Ueda for providing the C code which implements the algorithm. The user interface is provided by a Python wrapper around the authors' C code and was written by Ronald Sprouse.

    Lab members can contact Ronald Sprouse for copies of ifcformant compiled for OS X, Windows, or Linux systems. Unfortunately we do not have permission to distribute the C code or compiled versions of this tool to the public.

    For detailed usage information, run: ifcformant --help
  • convertlabel -- a command line tool for converting between Praat textgrids, ESPS label files, and Wavesurfer label files. You can also scale or shift timepoints in the label file by a specified amount. Written by Ronald Sprouse.

    For detailed usage information, run: convertlabel --help
  • ultracomm -- a command line tool for configuring and acquiring ultrasound data from an Ultrasonix Tablet system.
  • ultrasession.py -- a Python script for running ultracomm and simultaneously acquiring audio and ultrasound synchronization signals.

Local libraries

  • audiolabel -- a Python library for reading and writing Praat textgrids, ESPS label files, Wavesurfer label files, and time-aligned tabular data. Special access methods for retrieving labels at specified times or by matching label content. Written by Ronald Sprouse, and available on [github]. See meas_formants for a sample script that uses this library. The audiolabel_demo walks you through many of the steps executed in meas_formants.
  • SoundLabel.pm -- a Perl library for reading and writing Praat textgrids, ESPS label files, and Wavesurfer label files, written by Ronald Sprouse. Old and clunky API. You are encouraged to write scripts that use audiolabel instead.

Handy third-party tools

  • pyalign -- a command line tool for automatically aligning phones to an audio file based on an orthographic transcription of the audio.
  • reaper -- a command line tool for calculating F0 from an audio file