Music 62 notes
Data Compression of audio
Convenience vs. Quality: in analog era, cassettes and 8-tracks vs. LPs. In digital era, compressed formats.
Linear PCM (or any linear coding of digital audio) is inefficient: data rate is the same regardless of whether there's signal or not, or the frequency range, or the dynamic range. Is there a way to compress it so we can throw away the "unimportant" parts and keep the "important" ones?
Compressed audio is Different from dynamic compression-doesn't change dynamic range. Codec: Coder/decoder, or compressor/decompressor.
Lossless vs. Lossy
Unlike Stuffit (LZW, Huffman) compression, not recoverable. Lossless AAC has 2-to-1 size advantage, is truly lossless. Same as FLAC (Free Lossless Audio Compression--open source, 30-50% size reduction), Meridian Lossless Packing, Shorten (SHN).
How to do it:
Lossy compression, using "perceptual coding", psychoacoustic masking effect. Softer sounds, sounds close in frequency to others, sounds close in time to others, can be masked. Redundant data in the two channels eliminated. Stereo separation can be reduced, especially in the lower frequencies. Variable rate encoding uses a slower bit rate when data is less complex.
Divide the signal into many frequency bands, and determine how to decrease the resolution in each band based on the amount of quantization noise that will be audible under the signal in each band--when there is no signal the resolution can be as low as zero. Combined with other techniques.
MP3, short for MPEG-1 Layer 3: originally designed for compressing video, this is the audio part of the spec. Reasonable quality, compression ratio about 1:10, depends on bit rate. Most files use 44.1-kHz sampling, but others possible. At bit rates above 256 kbit it's very hard to hear differences from original.
Algorithm for making files is not necessarily free-needs to be purchased in software whose maker has paid a license to the owner (Fraunhofer Institute, German firm). License is cheap, it caught on well. Also available is an open-source encoder: LAME ("Lame Ain't an MP3 Encoder"), used by AudioHijack, Peak and others.
Some encoders work better at different bit rates: LAME is supposed to do a better job at higher bit rates.
Decoding algorithm is free. Portable MP3 players with hard disks or solid-state memory: iPod and phones with media players.
Apple iTunes uses AAC (Advanced Audio Coding), better algorithm, more efficient, can do multichannel. Part of MPEG-4 spec.
Streaming rates for different services:
Beats Music (RIP) Standard quality: 64kbps High quality: 320kbps
Spotify Normal quality: 96 kbps High quality: 160kbps Extreme quality: 320kbps
Google Play Music: Three quality settings, with a maximum of 320kbps.
Pandora: Maximum rate is 64kbps (adjusts automatically depending on your connection).
Apple Music: 256kpbs AAC
iTunes Radio: 320kbps at the top end, 64kbps/128kbps at the low end
Tidal: Premium ($10/month): 320kpbs AAC Hifi ($20/month): 1411kbps lossless FLAC
High definition (HD 720p or 1080p) video file = 192kbps
Standard definition (SD 480p or less) video = between 24 and 129kbps
Soundcloud: Streams at 128 kbps, MP3. Higher rates (up to uncompressed WAV) only if file has downloads enabled.
Bluetooth audio: Up to 320 MP3 or AAC, typically less. Some devices use Apt-X, new codec capable of lossless compression—but all devices have to have it or it doesn’t help anything.
Mastering engineers: Like finish carpenters: Job is to make things sound as good as possible; last resort in the production chain. Use gain, eq, compression to make tracks sound their best on the many media that recordings are being distributed on, and to sound consistent or at least compatible from one track to the next.
Highly-trained, best ears in the business are mastering engineers; best listening rooms and equipment are mastering studios.
But they have been in the past pressured by labels to make things loud. They are also being replaced by home-mastering tools like Waves Ultramaximizer which can do the same job, but without the training, taste, or subtlety.
What's being done about it? Recommendations through broadcasting associations like ITU so that average loudness is consistent through all different types of program material. But streaming services are actually doing a better job since they have a loudness index that they have to adhere to.
Mastering can be expensive (you pay per hour) but worth it for a commercial release.
Quadraphonic introduced early 1970s. Competing formats: QS, SQ (matrix encoded), CD-4 (subcarrier, like FM stereo, so rear channels are discrete). Only true discrete format: analog tape. Failed. Some bizarre recordings, placing the listener in the middle of the NY Philharmonic, with the orchestra lined up against the walls.
Surround took off in movie theaters, with speakers along the side walls, and then when home theaters started to get big, moved into domestic market. Delivery system: Video DVD with multitrack audio.
Most common: 5.1 Three front, two rear ("surround"), subwoofer (.1) for low-frequency effects (LFE). Low frequencies not perceived directionally, so you don't need discrete subwoofers. LFE must be good up to 125 Hz. Not really designed to be used for music, just for special effects in movies like Godzilla and Earthquake but now used in music too. Other possibilities, not standard: 6.1, 7.1, 10.2, etc.
Two major systems for film: Dolby Digital (also called AC-3), uses lossy compression and reduces bandwidth by 90%. DTS compresses by about 65%. Needs to be encoded at mastering end, decoded at consumer end. Encoding software is expensive ($1K+), since the technology needs to be licensed from Dolby. When mixing for those formats, you need to have an encoder and decoder in the studio so you can check to see how it translates.
Must mix in 6-channel environment, although delivery systems often encode into fewer channels. Best is to use five matched full-range speakers, can be small, + sub.
Dolby Atmos, adds two or four height speakers, and additional channels.
Dolby E for broadcasters: collapses 6 channels down to 2 (AES/EBU) since satellites, video recorders, and other broadcast chains are only set up for 2 channels. Lossy, but clean: Can often go through five or six generations before you notice any problems, according to Dolby.
For music, DVD-A (4.7 GB as opposed to 700 MB) uses lossless compression (Meridian Lossless Packing, about 50% data reduction) to allow audio on multiple discrete channels. Can also use higher sampling rates, but if you go too high, you sacrifice channels. Pretty much dead.
Super Audio CD (SACD) also high capacity (same basic format as DVD), uses Sony's Direct Stream Digital: 1-bit, 2.8MHz sampling rate. Often dual-layer "Hybrid" discs: SACD on top, normal CD underneath (different focal lengths for the lasers). Most of BMOP's recordings are released in this format.
Also DTS audio compressed onto standard CDs, but never caught on.
Mixing: Where do you put the listener? Where do you put the sounds? In film, dialog is in the center, music and effects on the sides, ambience and special effects in the rear. Sometimes put dialog somewhere else where you need a character to be moving or off-screen.
In music, not as clear: Is the rear just for reflected hall sound, or do you want the audience to feel in the middle of the orchestra/band? For electronic music, there's no objective reality. Can literally make the room spin.
If you mix a vocalist into the center, how much of that should also be in the L+R? "Focus" control in some mixers determines this. Some mixers don't use the center at all, but let the decoder create a reduced-level L+R there. Using center means sweet spot is bigger.
Music mixing often doesn't have an LFE track. Instead, most decoders have a "bass management" feature which filters out below 80 Hz and sends it to the subwoofer. Only LFE needed if you are doing the 1812 Overture or need really throbbing synth bass.
Micing: no clear way to mic in surround. Use ambience mics, do multitrack recording.
Exception: Ambisonic: three-capsule "Soundfield" microphone, uses encoding into "B-format", which is four channels: W, X, Y, Z. Can then be decoded into any number of channels. Great for classical music in a good hall.
The W channel is the non-directional mono component of the signal, corresponding to the output of an omnidirectional microphone. The X, Y and Z channels are the directional components in three dimensions. They correspond to the outputs of three figure-of-eight microphones, facing forward, to the left, and upward respectively. (Note that the fact that B-format channels are analogous to microphone configurations does not mean that Ambisonic recordings can only be made with coincident microphone arrays.)
Other surround issues:
Have to be careful with vocal plosives and room rumble: it may not show up on your nearfiled monitors, but will end up in the subwoofer.
Most reverbs are only stereo, so you need at least two to do surround, one for front and one for rear. True surround reverbs had been very expensive: now there are a few in the $300-$500 range, and getting cheaper.
Always check stereo compatibility: lots of people will be listening in stereo!
Melodyne exercise: Put all the tracks to want to use through a bus (send, 1-2). Create an aux track with that bus as the input. Insert Melodyne into the aux track. In the melodyne window, click "transfer", and play the tracks. The tracks will appear in the Melodyne window, and you can edit and listen to them there. When you're done, either leave the window open or freeze the track.
What people do:
Sound engineers-their skills, what they do, what tools they use
microphone choice and placement, preamp choice
gain staging—making sure signals are sent and recorded at proper levels
editing or comping—putting pieces from various takes together
replacing—live instruments with sampled ones
syncing—MIDI-driven instruments with live tracks
mixing—keeping signals audible and out of each other's way by eq and compression
effects (DSP)—adding delays, reverb, pitch correction, phasing, tape saturation, etc.
Old model: record company hires producer to keep band in line. Decides which songs to record and order to record them; organizes sessions; hires arrangers, maybe other musicians to be in sessions; supervises mix; decides sequence on disc.
New model: band hires producers to help them be organized, to make musical decisions, and/or to act as extra ears, or to give them cachet and entry into higher-visibility markets.
Newest model: producer puts his/her own stamp on the record so that it will be accepted as part of his/her oeuvre. Artist contributes raw material for song, vocal, producer does the rest: beats, loops, arrangements, processing.
Other Careers in audio:
recordingfilm/TVcommercialsWeb (Flash)librariesGames: loops, layers, transitions
Sound design: film/TV/WebTheaterOrchestrating/arranging
Assisting composers, esp. film/TVAssisting studios, producers, artistsMusic direction and playing for artists, theaterAudio engineeringStudioBooks on tape/PodcastsAudio for visualsBroadcast production
Algorithms for streaming services
Formatting and conforming audio for Web (Flash, etc.)RingtonesGames: translation of music>MIDI and vice versa (Guitar Hero)
System design & installationStudios, project/home studiosAdvertising agenciesWeb developersGame developersBroadcastersTheatres, cinemas
Houses of WorshipIndustrial (PA, background music)
Product designSoftware synthsPlug-insSequencers and performance programsHardware controllersPro and consumer audio hardware Tysy
Tech supportConcert crewTheater Radio/TV/cable/Webcast Software companies (sequencers, instruments, plug-ins)Hardware (instruments, audio components, computers)
Tech writing, documentation
Sales & marketing inside, retail
Education: college, high/middle school, trade school, manufacturer seminars, videos, online courses
EQ, 2 compressor/expander/gates, exciter, de-esser, transient processor, limiter. Can re-order modules on “graph “ page.
EQ: Eight bands of bell, highpass, sharp highpass, lowpass, sharp lowpass, high shelf or low shelf filters, Baxandall (gentle and wide). Adjust frequency and gain with balls, and adjust Q with handles around each ball.
A spectrum analyzer operates in the background.
Hold down the option key to create a notch filter that sweeps the spectrum, Moulton-style
Dynamics: Digital or "vintage" simulation, hard or soft knee. Multiband option gives you separate compression/gating on three different bands with adjustable crossover frequencies.
Transients: emphasize or de-emphasize transients. individual gain for attack and sustain portion of signals. Adjustable timing of attack and sustain windows.
Exciter: adds odd and/or even harmonics in different balances. Makes sounds stick out more. Also useful when you have a sound with troublesome high frequencies--you can equalize them out, and more or less rebuild them with the exciter.
Limiter: soft or brickwall. Phase reverse one channel or both.
MIDI and Virtual instruments
Different flavors: VSTi, AU, proprietary, Rewire (just for Reason)
In ProTools: Set up Instrument track. Put instrument in the top Insert slot. Instrument track creates audio from virtual instrument, sends MIDI to it. Plug-ins can be used on Instrument track after instrument.
Reason is different: set up separate audio and MIDI tracks. Launch ProTools, then launch Reason. Create a new stereo audio track, and insert Instruments>Reason. This is your audio path back from Reason. In the little “ReWire” panel that pops up, set the output to Mix L+R.Now create a MIDI track, with the input as the Impulse keyboard and the output on whichever Reason module you want to play (you can have as many modules in Reason as you want; ProTools will find all of them).Make sure the PSync references in Reason are set up so that there is NO keyboard input and NO MIDI channel assignments. While you’re at it, make sure the audio output in Reason is in “Rewire Slave Mode.” If it’s not, then quit Reason and relaunch.
MIDI sequencing: recording, overdubbing (merging--can’t do that with audio!), quantizing.
Record a performance, change its speed, pitch, articulation, sound independently.
Useful for scratch tracks, click tracks, also there are plug-ins that do drum replacement, if you have a good drummer but lousy drum sound. Isolate drums, replace with MIDI drums.
Why sync a computer sequencer to something else?
sync audio with video; sync multitrack tape with MIDI and hard-disk audio.
From the beginning of synthesis, musicians wanted to sync sequencers to tape recorders.
All devices must know:
1) what time it is, i.e., where we are in the program
2) when to start
3) how fast to go and in what direction.
SMPTE time code: what is it? Originally for video. Analog signal accompanies video signal on a separate track, with digitally encoded information about video timing.
Follows video frame rate, 29.97002997 (not 30!). Describes the beginning of each frame with a number, consisting of an 80-bit word. Lines the first bit up with the beginning of the frame. Hours, minutes, seconds, frames. When a program refers to "Subframe" it’s talking about bit count, or 1/80 frame. One machine is master, others are slaves. Synchronizer compares numbers, controls speed of slave machines to conform. Change in speed is called “slew”. Needs to be inaudibile.
Signal is more or less a 2400-Hz square wave, with 4800 Hz harmonic and “sub-harmonics” at 1200, 600, etc. Can easily be recorded on audio tape, or audio track of videotape. Also called Linear Time Code. Sounds awful and can blow your speakers.
Used in analog tape-to-tape sync, like multiple 24-tracks. Accuracy is close enough for analog (1/2400 sec=0.416 msec), but not for digital.
Two flavors, drop-frame and non-drop, just different ways of counting. Drop-frame is (almost) real-world accurate; non-drop uses continuous numbers. Normal timecode varies from "real world" time by 3.6 seconds/hour (0.1 %). Drop-frame skips the first two frame numbers (0,1) at the start of each minute, except at minutes 0, 10, 20, 30, 40, and 50: so 01:03:59:29 is followed by 01:04:00:02
2 frames x (60-6)=108 frames = 3.6036 seconds (@ 29.97). Error is reduced to .0036 seconds/hour=2.59 frames/day.
Studios and broadcast stations reset timecode clocks daily.
Window burn, a/k/a visual time code or burn-in: visual representation of time code on screen, to make it easy to see the frame numbers. Not timecode itself.
QuickTime=movie format can be used inside Pro Tools. Frame rates automatically calculated in the computer. ProTools displays SMPTE number. Always use 29.97! 30-frame video does not exist, although software makers insist on including it.
iMovie and Premiere: can import audio and match frame rates to video.
Multitrack Session planning
The ideal: band is well-rehearsed, comfortable with headphones. You have a studio big enough and with enough isolated areas to accommodate them, and enough great mics and inputs to record them all at once, and some room mics too, and yet have good isolation between tracks.
Set a good balance, and record them. Afterwards, record fixes and overdubs as necessary.
Get the final balance in the mix.
The real: You can only do a few tracks at a time.
Recording to multiple tracks
For pop music, start with drums and bass and a guide track, since recording the rhythm section to an existing melody or even guitar part is very difficult (guitarist has to have great timing):.
Guide track can be vocal to get good feel in rhythm section. Record the vocal well—you may keep it.
Recording the first time:
Use the device, routing, and mixing templates in MOTU and ProTools. Don’t save over the templates!
Mute all the channels in ProTools, so you are monitoring through the MOTU. Use the headphone outputs of the snake, through the headphone boxes (connect with XLR cables). The fader positions will affect what you hear, but not what goes onto ProTools. Use the meters in ProTools and mic trims in MOTU to make sure you have a good signal.
To overdub: The mix of the recorded tracks comes from ProTools, send it to cue and/or monitors, You can adjust cue mixes independntly using cue sends, otherwise use MOTU faders. Live channels come through MOTU (cues and/or Main), keep mutes in ProTools engaged to prevent double hits.
Click track? Make an instrument track with Tracks>Click track, set tempo with Event> tempo operation. Most drummers have a hard time with a click track!
If you use a click track and know the tempo, you can use tempo scale to edit with—it's much easier to move things around in bars and beats than in minutes and seconds. But if tempo fluctuates at all, tempo scale will get in the way, and you'll need to construct a conductor track.
Latency: Because you can't pull mutliple tracks off the hard disk at exactly the same time, you need to create a buffer. Also needed for some processing, like reverb. Buffer creates latency between input and output when there's live input, so when recording, set the buffer low. If it's too low, you will hear dropouts and/or get error messages.
Pro Tools keeps track of latency and adjusts newly recorded tracks so they sync correctly with previous tracks.
Hints and suggestions:
Mark places in the room with tape on the floor and/or take pictures, so if you have to come back you can duplicate your setup.
Try to monitor as far away from the sound sources as possible. The headphone splitter boxes can be extended using standard mic cables.
Be careful handling the mics. Someone left an EV mic in a precarious position in the closet, someone else knocked it over, and now it's damaged, although still usable.
Get set up well before your talent shows up. Nothing more frustrating to a musician than to sit around, frustrated, while the tech staff tries to solve some esoteric problem.
Do the drums first! These take the longest to set up. Have the drummer come in early, and get the drums tuned and set up, and get your mic positions figured out. Then maybe take a break before the rest of the band comes in.
If the musicians want to practice or noodle in the room, you can't concentrate on what you're doing. But also give them time to warm up before you do an official take. But also record the warmups if you can.
Take breaks. Give yourself and the talent time to breathe, so you don’t get so caught up you don’t realize when something is not right.
Balance/volumes, panning, reverb
EQ to bring out instruments or to carve space for them in the mix
Compression to control dynamics
Delay or reverb to create acoustic space
Chorus on some instruments to give them motion
Plug-ins usually in this order:
EQ > dynamics > phasing-chorus-flanging-distortion > reverb
Pro Tools: Loop recording
Multiple takes: use adjacent tracks, comp together by selecting regions and moving vertically (Hold shift key to lock in time).
Automatic volume control, keep instrument dynamic ranges limited. Also creative: increase sustain on some sounds like guitar, cymbals, drums, bass. Brings loud sounds down AND, with make-up gain, brings soft sounds up. How to use it creatively.
“Brick-wall”, compressor with infinite ratio (or greater than 10:1). Prevents signal from going too high. Used in broadcasting, mastering, wherever signals absolutely can’t exceed a threshold. Applied too thickly makes recordings sound louder without being so, but removes dynamic range. Different from normalizing, in which level is maximized but dynamic range is maintained.
Compression Side Chain
Amplifier listens to an input signal that’s not the main input. Ducker, Voice announcement system, radio station.
Compressor with negative ratio. Used to increase dynamic range by forcing lower sounds lower. Not used much in recording except as:
Compressor with infinite negative ratio, used to remove low-level sounds completely. E.g. Guitar amp noise. Used on stage for vocal mics, and in studio to isolate drums.
Side-chain: Keyer. Used to make one sound follow another, like noise generator or ambience track following a drum beat.
Delay: single slaps, loops (w/feedback), static comb filter: some frequencies reinforce, some cancel. Feedback deepens notch.
Phasing & flanging: Early phaser used all-pass filter, which changes phase shift as a function of frequency. The “corner frequency” is defined at the frequency where the phase shift is 90°. You can sweep this frequency with an LFO. Effect was subtle: usually needed to chain several of them--at least four--to create enough effect. Notches in the frequency spectrum were not related to each other.
Flangers. Got their name from early technique: needed four tape recorders: source deck, two processing decks, record deck. As signal went through processing decks, put a finger on the flange of one reel to slow it down slightly, would create a delay.
Now use digital delays that change over time, with an LFO that sweeps between to delay times. Short delays make filters with more teeth, feedback adds resonance to the point where it can oscillate.
Since dry recordings and synthetic sounds have no sense of space.
Standard model of reverb: Direct sound, pre-delay (first reflection: distance of source from nearest wall), early reflections, tail (RT60). To make vocals clear, use longer (>50ms) pre-delay so that reverb doesn’t overwhelm the track.
Original types: Chambers, Mechanical (plate, spring)
Sometimes use more than one reverb: special reverb on some instruments (e.g., drums), then overall reverb on everything to make it sound as if it’s all in the same space.
In ProTools: create Aux track with Bus 1-2 as input, insert Reverb plug-in, add Send (to Bus 1-2) on all tracks you want to apply reverb to. Sends allow different amounts of reverb to be applied to different tracks. Aux channel level is “return”: amount of overall reverb.
Analog recording history
Cylinder, wax: vertical “hill-and-dale” recording.
Lacquer/vinyl discs: lateral recording (mostly)
78s, 4-5 minutes on a side. Records came in multi-disc “albums” so you could have an entire symphony.
33 (Columbia) and 45 (RCA) came out at the same time, 45 ended up being used for singles.
Stereo discs, cutter head with two coils at 45° angle to vertical. Had to restrict dynamic range or playback stylus might jump out of groove. Had to put bass in the center of the stereo image.
Five-step process: master lacquer or acetate (positive), metal master or father or matrix (negative), mother (positive), stamper (negative), disk.
To maximize playing time, grooves could change distance between them (“pitch”) so softer passages could be closer together. When tape became the mastering medium, you could put a separate playback head to look ahead and determine pitch.
Tape recorder basics: tape formulas, heads, transport, bias, noise reduction
= records waveform voltages by aligning magnetic particles or “domains” on tape in step with the changing voltage. "Head" is transducer between AC voltage and fluctuating magnetic field. Tape is iron oxide or similar on plastic. Actual magnet is called "gap". As voltage changes, orientation of particles changes. Bias (invented by the Germans—this was their big breakthrough) is high-frequency signal that keeps domains moving at all times, eliminates inertia that causes distortion.
Dynamic response is limited: inherent noise caused by random orientation of particles ("hiss") means sounds cannot go much lower than noise level. Top end limited by "saturation": if magnetic particles are pushed too much, they resist, and the waveform will be distorted. Under controlled conditions, this can actually add to the "warmth" or immediacy of the sound, but often it just makes it sound nasty.
High-frequency response limited by size of gap and size of domains: smaller gap means more particles per inch of tape. Finer particles mean more particles per square inch of tape. Also, speed of tape big factor = number of particles per unit of time. Professional tape speeds: 15 and 30 inches per second on 1/4" tape. Consumer (old): 3-3/4 and 7-1/2 ips. Multitrack needed wider tape: 1/2", 1", 2". Fostex and Tascam bucked trend.
Analog cassette: 1-7/8 ips on 1/8" tape, 4 tracks (stereo, both directions).
Dolby noise reduction is a scheme for boosting certain frequencies on record and reducing them on playback to lower noise and increase dynamic range. Best we can ask for in analog tape is about 70 dB of dynamic range. As tape widths and speeds went down, needed it more. Four types: A, B (consumer), C (better consumer), SR. New Dolby formats refer to film and transmission codecs.
Competing system: dbx. Still used in stereo broadcast TV.
In post, single-ended: SoundSoap, RX. Work by sampling noise and removing it from recording. More later!
Mics for second project:
2 Røde NT1 cardioid
1 Røde NT2 multi
2 AudioTechnica 2050 multi
2 Nady Ribbon bi-directional
1 AKG D112 cardioid
2 Shure SM58 cardioid
2 Shure SM57 cardioid
Make sure to tie cables after you wrap them!
Clocking in digital audio
When using multiple digital sources, they must have a common clock, or else there will be clicks where the clocks are out of sync and a samples are dropped. So there must always be one master.
Word clock signal can be generated by one device, and fed through the others, or fanned out to the others.
Or, if all devices are capable of syncing to incoming digital audio stream, you can daisy-chain them.
Master clock when you're recording should be the device that is doing the analog-to-digital conversion.
Jitter is slight variation--at the nanosecond level--in the clock rate. Result is noise similar to quantization noise, but in the frequency domain as opposed to the amplitude domain. Very hard to find a digital clock that has audible jitter! You can spend a lot of money on dedicated clocks, but listening tests show that they make no difference.
Only reason to use an external clock: if you’re using multiple A-to-D converters.
Advantages (possible freq response, dynamic range increases? Better marketing!), disadvantages (more storage; more bandwidth through mutlitrack interfaces; more CPU power needed, therefore harder on DAW)
Multiple mic setups
Single mics on most instruments. First, place the instruments in the room so that they sound good, and the musicians feel good and can hear each other. Then place the mics where they sound good.
Observe the 3-to-1 rule: Mic must be at least three times further away from something it’s not picking up than what it is picking up. Sometimes phase switch (press Ø/insert/delay button to bring up phase switches on screen) can help with leakage problems.
Use stereo micing (pattern of your choice) for piano, drums, marimba, large instruments. Also consider using stereo room mics as well.
Musicians should balance themselves before you set balances.
All channels except kick or bass: Use high-pass filter on mic or mixer. Eliminates room noise.
Live mixing: Pan individual instruments where you want them. They should be more or less in the same place as they are in the room. If that's not possible (instruments are against two walls or in a circle around the mics) then adjust intelligently.
Soloing: in place (meaning in their pan position). Use to analyze individual mics. Only affects monitor and headphone outputs, doesn't affect stereo bus, so you can do it during a take.
Transmitting digital audio
AES/EBU = AES Type I Balanced – 3-conductor, 110-ohm twisted pair cabling with an XLR connector, 5 volt signal level
S/PDIF = AES Type II Unbalanced – 2-conductor, 75-ohm coaxial cable with an RCA connector, used in consumer composite video, 0.5v
The two data sets are almost identical. You can easily convert from one to the other with a simple voltage gain or drop.
TOSLINK = AES Type II Optical – optical fiber, usually plastic but occasionally glass, with an F05 connector.
ADAT Lightpipe, 8 channels on optical fiber. Same cables as TOSLINK, but not compatible
Tascam TDIF, 8 channels on DB25 connector, same as original SCSI spec.
MADI coaxial (BNC connnector) or optical (wider than TOSLINK), 48 or more channels, used in older multitrack decks and high-end installations. Making a bit of a comeback with multichannel digital consoles.
Newest format: over Ethernet. Right now mostly for live sound: allows single cable between stage and mix position. Several formats:
Ethersound, CobraNet, Aviom, MOTU AVB
Dante owned by Audinate most promising: can use ordinary switchers. About 200 manufacturers now have licenses.
Power Amplifiers: matching to speakers, impedance (= resistance at audio frequencies, in ohms),
damping factor: ratio of speaker impedance to source impedance. How well it controls mechanical resonances: high damping factor acts as a "brake" on the cone; low damping factor means it can ring. So you want output impedance low (typically 0-1Ω), speaker impedance high (8Ω down to 2Ω).
Many amplifier manufacturers state power levels going into a low-impedance load, makes them look more powerful.
Take a sample of the signal voltage and write it down as a number.
Issues: how often (sample rate), how accurate is the number (word length), how accurate is the sample clock (jitter).
A-D converter does this.
Nyquist theorem: highest frequency sampleable is 1/2 the sampling rate. If you go too high, you get aliasing. Nyquist frequency=sampling rate/2
Word length: number is binary, so number of bits determine the range. With 10 bits, you get 0-1023. With 16, 0-65335. Difference between analog input and digitized signal is called Quantization noise.
Dynamic range=highest level possible/quantization noise level=6.02 number of bits + 1.76dB
D-A converter: creates signal voltages from samples; uses sharp filters (decimation) to round off the edges of the waveforms.
Digital Recording formats:
Sony PCM-F1, PCM-1610, JVC, etc: used video tape, either 1/2" or 1/4", so
could only edit on frame boundaries i.e., 33.3 msec resolution.
DAT: was killed in the consumer market by RIAA lobbying for law requiring SCMS chip in consumer units, so no one made any.
Multitrack: Mitsubishi, Studer, Tascam, Sony PCM-3324 and -3348. Most of them long gone. Replaced by ADAT, and to some extent by Tascam DA-88 (didn't do as well: price point was higher and introduction was a few months later).
ADATs could easily be combined, and controlled by a single controller which acted as if it was a 32-track deck.
Digital disc vs. digital tape
Tape is sequential, disc is random access.
With tape there is a direct correlation between the number of physical inputs, the number of tracks, and the number of physical outputs.
With disc, there is no correlation: inputs and outputs are determined by the audio interfaces, and tracks can be much higher: determined by the speed of the CPU and the throughput of the disc.
Disc systems can do processing on the fly, non-destructive editing
In early days when space was scarce, sometimes used destructive editing. Now "constructive": do a file edit, and it creates a new file.
Using speakers in the lab: not when anyone else is in the lab!
Using Audio-technica headphones: Get key from keysafe for tall cabinet. Take a short cable from the box on the second shelf--the two ends are different sizes, and the smaller one goes into the headphones. Make sure you return them, lock the cabinet, and scramble the keysafe combination when you're done.
Damage: causing woofer cone to go too far can tear it or pull it off its mount. Sending high-frequency distortion products to tweeter can damage it.
How to use speakers in practical situations? Get used to them! Listen to music that you know on them, so your ear can make comparisons.
Headphones: open (foam), closed (Koss), semi-closed (lighter plastic), noise-cancelling (Bose), bass heavy (Beats).
Can be more accurate, move much less air so elements are lighter, no room effects.
Usually one element for entire frequency range. Problem: interaural bleed is gone, so stereo image is very different from speakers. Processors beginning to appear that simulate speakers in headphones.
Ear buds: Getting better! Watch out for exaggerated LF response. Watch SPL!!
In-ear monitors: Isolated, advantage is less sound on stage getting into FOH system. For bass players and drummers, often combined with speakers or throne drivers, e.g. Expensive but worth it for professionals.
“Buttkicker” for drummers so monitor levels don't have to be so high on stage.
How to use speakers in practical situations? Get used to them! Listen to music that you know on them, so your ear can make comparisons.
Of all the components in an audio system, these have by far the worst frequency response and distortion. Physics of moving air is difficult. The perfect speaker would weigh nothing and have infinite rigidity. The spider which holds the cone against the magnet would weigh nothing and have infinite flexibility. The space inside the cabinet would be infinite so that nothing impedes the movement of the cone.
Break up the spectrum into components that work best over a linited range: Woofers, tweeters, midrange, Sub-woofers.
Directivity: low frequencies spread out more, high frequencies are localized, “beamed”.
Crossovers: filters to divide the spectrum between the elements.
Distortion: harmonic, intermodulation
Time-aligned: tweeter is delayed or set back to compensate for depth of woofer cone. Theory says this preserves transients, prevents phase interference between drivers at overlapping frequencies. Concentric drivers sometimes used for time/space alignment (e.g., Tannoys).
Passive vs. active speakers: Crossover goes after amp, or before amps. Bi-/tri-amplification.
Sensitivity: output SPL at a given distance, per 1 watt input.
Other specs: freq response, THD, maximum power, often miselading.
Near-field: small speakers up close to minimize room effects.
In a studio, use multiple speakers to monitor recording and especially mix: high-end and low-end. Auratones, Yamaha NS-10s popular for simulating home hi-fi, television, car. NS-10s very bright, engineers often used tissue paper to calm them down.
ProTools: After recording automation, use drop-down in Edit window to show movements, edit. Select, delete, scale up or down, or pencil edit.
Grouping stereo or multiple tracks for editing and/or mixing. (Using pencil editor on automation on one track in a group will not change the others.)
Bouncing to AIFF/WAV: File>Bounce to Disk. 16-bit, 44.1 kHz, interleaved. Mix will be stored in Bounces folder of session unless you specify otherwise.
In the studio: four different types of rooms
Control room—very tight, flat
Live room—may have different areas with different acoustics
Drum room—also may be variable
Iso booth—for voices, amps, usually dead
Three studio paradigms:
Digital hard-disk ("in the box")
Studio wiring: input panels, monitor outputs, patch bays, cue systems
1st Projects Due Oct 4: 2 mics to Pro Tools.
On the cart:
MOTU Stage B16 is mixer for inputs, outputs, and cue mixes, also is audio interface and A/D D/A converter.
Mac Mini has external drive. Operate with wireless keyboard and trackpad or wired keyboard and mouse.
Speakers for monitoring, also headphones from B16 output.
Remove speakers and snake. Use snake for mic inputs.
Open mixer controls in Safari only! Look for MOTU icon in upper right. Mic trims, panning--for this project, hard left and right. Use phantom power if necessary. Get a "green " signal without a red light. Level should top out between -12 and -18. Fader positions don’t matter.
ProTools 12 on the recording cart:
When you launch ProTools, double-click “Stereo Template” from the Template Group window, and then name your session and save it on the External hard drive. Now all the files should be in the right place inside your session folder.
Recording into ProTools. 44.1 kHz, WAV,16-bit. Or 24-bit
After recording, use a flash or portable drive to move entire folder onto lab computer to edit and/or mix or use network access to the server from the desktop.
Two Rode multipattern and two Electro-Voice n/d267a dynamic cardioids with stands and cables, plus two headphones, AC extension cord, are in the milk carton on the cart.
Use pop filter on all vocals!
Cable wrapping! https://www.youtube.com/watch?v=ktI0mLAoSTc Please use velcro or ties to secure your cables after you wrap them!
Choosing your mic pattern: use the results of our experiment. How much “center” do you need? Is tonal balance or spatial placement more important. If recording instruments of very different volumes, not necessary to stick with a stereo pattern, just make it sound good in stereo.
If you use M-S, when you are playing back, duplicate the Side channel onto another track. Phase-reverse it using using Trim plug-in, and then group both Side faders. Level of side faders determines width of stereo image.
Edit if you need to. Goal is to make something that sounds realistic, and good. Try different instrument and mic positions.
Smart tool for trimming, selecting region (command-E), fading in or out, cross-fading between adjacent regions.
Editing modes: Slip: move freely. Grid: move in quantized intervals. Shuffle: move a region and other regions jump around to fill in.
To automate fader movements: put track in auto “write”, not Record! To play back, set to “Read”.
Grouping stereo or multiple tracks for editing and/or mixing: select tracks, command-G
First project teams:
B. Berke Imren
John Morgan Keane*
PRIORITIES IN A RECORDING SESSION/STUDIO (in descending order, according to Prof. Lehrman)
A/D converter — Mic preamp
Analog mixer/channel strip
Master clock (except when you have multiple A/D converters)
Plug-ins / Outboard (unless you have very specialized needs)
Microphone techniques: respect the historical use of instruments!
• What's the instrument?
• What's the performance style?
• Is the room sound good? Is it quiet?
• Are there other instruments playing at the same time?
• How much room sound do you want?
• What mics do you have?
• Do you want stereo or mono? How much stereo?
Good positioning is always better than trying to eq later. Good positioning means phasing is favorable: hard to fix with eq!
Mics need to be closer than our ears, since we don't have the visual cues to tell us what to look for, and mics can't distinguish between direct and reflected sound--we always want more direct sound in the recording. Can add reflections (echo/reverb) later, but impossible to remove them!
Listening to the instruments in the space: finding the right spot to record. Get the room balance in your ear, then take two steps forward and put the mic there.
3-to-1 rule: when using multiple microphones, mics need to be at least three times as far away from each other as they are from their individual sources.
Winds & Strings: not on top of the bridge. Too close, loses resonance and high frequencies (data from Michigan Tech, using DPA 4011 cardioid).
At least 3 ft away from source, if possible, except when it would violate 3-to-1 rule! String sections: mic in stereo as an ensemble, not meant to be a bunch of soloists. Horn sections, can go either way: mic individually or if there is enough isolation from other instruments, as section.
Guitar: exception since we are used to hearing close-miked guitars. But there is no one good spot on the guitar, since sound comes from all over the instrument: soundhole (too boomy by itself), body, top, neck, headstock. Best to use 2 mics, or if room is quiet, from a distance.
Vocals: Always use pop filters
Piano: exception since pianists like the sound of the instrument close up--doesn’t really need the room to expand. Different philosophies for pop and classical. 3:1 rule on soundboard, or even better, 5:1 since reflections are very loud and phase relationships very complex. Can use spaced cardioids, spaced omnis, or coincident cardioids, in which case you want to reposition them for the best balance within the instrument (bass/treble). Stereo image? Performer or audience perspective?
Drums: first of all, make them sound good! Tune them, dampen rattles, dampen heads so they don’t ring as much (blanket in kick drum).
Three philosophies--Choice will depend on spill, room sound, and how much power and immediacy you want in the drums.
1) stereo pair overhead (cardioid or omni); good for jazz, if you don’t mind some spill, or if they’re in a good-sounding isolation room.
2) add kick (dynamic or high-level condensor) and snare mics for extra punch and flexibility
3) add mics to everything. Complicates things because of spill, may have to add noise gates later.
Glyn Johns technique --
Transducer = converts one type of energy to another
Microphone = converts sound waves in air to Alternating Current (AC) voltages. Dynamic Microphone has a magnetic metal diaphragm mounted inside a coil of wire. Diaphragm vibrates with sound waves, induces current into coil, which is analog (stress the term!) of sound wave. This travels down a wire as an alternating current: positive voltage with compression, negative voltage with rarefaction.
Dynamic/moving coil (pressure-gradient mic)
Condensor/capacitor=charged plate, uncharged plate, acts as capacitor, one plate moves, capacitance changes.
Charge comes from battery, or permanently-charged plate (electret), or dedicated power supply (old tube mics), or phantom power: 48v DC provided by mixer (doesn’t get into signal, because input transformer or blocking capacitor removes it).
Ribbon (velocity mic)
Metal ribbon is suspended between strong magnets, as it vibrates it generates a small current. High sensitivity, good freq response, a little delicate, figure-8 pattern.
Boundary (pressure zone)
Owned by Crown. Mic element is very close to wall. Hemispherical pickup, reflections off of wall are very short, essentially non-existent, prevents comb-filtering caused by usual reflections, even frequency response. Not good for singing, but good for grand piano (against soundboard), conference rooms, theatrical (put on the stage, pad against foot noises).
Polar patterns/phase relationships.
Standard configurations Pickup pattern design. Off-axis response, proximity effect. Pop filters on vocals and flute.
Cables: Balanced vs. Unbalanced:
Balanced = two conductor and surrounding shield or ground. Two conductors are in electrical opposition to each other — when one has positive voltage the other has negative. At receiving end, one leg is flipped in polarity—also called phase—and the two are added. If noise is introduced, it affects each conductor the same. If you flip any signal and add it to itself, the result is zero. Because it is flipped at the receiving end, the noise cancels out. This means there is little noise over long lengths of cable. Best for microphones, which have low signal levels, but also for long lengths of line level.
Unbalanced = single conductor and shield. Cheaper and easier to wire, but open to noise as well as signal loss over long length, particularly high frequencies due to capacitance (of interest to EEs only). Okay for line-level signals over short distances (like hi-fi rigs or electronic instruments), or microphones over very short distances (cheap recorders and PA systems).
Connectors: Balanced: XLR (as on microphone cable), 1/4” tip-ring-sleeve.
Unbalanced: RCA (“phono”), 1/4” (“phone”), mini (cassette deck or computer).
Mini comes in stereo version also (tip-ring-sleeve), for computers and Walkman headphones (both channels share a common ground). 1/4” TRS is also used as a stereo cable for headphones = two unbalanced channels with a common ground.
Waveforms = simple and complex (show)
Simple waveform is a sine wave, has just the fundamental frequency. Other forms have harmonics, which are integer multiples of the fundamental. Fourier analysis theory says that any complex waveform can be broken down into a series of sine waves.
Saw: each harmonic at level 1/n. Square, only odd harmonics at 1/n. Triangle, only odd harmonics at 1/n2
If there are lots of non-harmonic components, we hear it as noise.
White noise: equal energy per cycle (arithmetic scale)
Pink noise: equal energy per octave (logarithmic scale-more suited for ears)
Timbre = complexity of waveform, number and strength of harmonics. We can change timbre with filters or equalizers.
Stereo = since we have two ears. Simplest and best high-fidelity system is walking around with two mics clipped to your ears, and then listening over headphones: this is called binaural. Binaural recordings are commercially available: they use a dummy head with microphones in the earholes.
Systems with speakers are an approximation of stereo. The stereo field is the area between the speakers, and the “image” is what appears between the two speakers. If you sit too far from the center, you won’t hear a stereo image.
Multi-channel surround can do more to simulate "real" environments. Quad, 5.1 (.1=LFE since low frequencies are heard less directionally), 7.1, 10.1, etc. Will do a little with it in this course.
Position in the stereo or surround field = L/R, F/B, U/D. Determined by relative amplitude, arrival time, and phase.
Fidelity: what is it and what can get in the way? What goes in = what goes out.
Ideal amplifier=A straight wire with gain (signal is louder)
Coloration: Frequency response is limited.
Function of frequency response curve is not linear.
Distortion is introduced, certain extra harmonics are produced, either even or odd.
• Distortion caused by clipping or non-linearity: adds odd harmonics, particularly nasty (show in Reason)=harmonic distortion
• Crossover distortion= certain types of amplifiers, where different power supplies work on the negative and positive parts of the signal (“push-pull”). If they’re not balanced perfectly, you get a glitch when the signal swings from + to - and vice versa.
• Intermodulation distortion=frequencies interacting with each other.
• Aliasing, a by-product of digital conversion.
Noise, hum, extraneous signals, electromagnetic interference (static, RFI)
Frequency sensitivity changes at different loudness levels: at low levels, we hear low frequencies poorly, and high frequencies too, although the effect isn’t as dramatic. Fletcher-Munson curve: ear is more sensitive to midrange frequencies at low levels, less sensitive to lows and extreme highs. In other words, the frequency response of the ear changes depending on the volume or intensity of the sound. When you monitor a recording loud, it sounds different (better?) than when soft.
Loudness sensitivity: Just Noticeable Difference (JND)--about 1 dB--changes with frequency and loudness level. We can often hear much smaller differences under some conditions, and not hear larger ones under different conditions.
Also, JND changes with duration--short sounds (<a few tenths of a second) seem softer than long sounds of the same intensity
Basic audio principles:
Nature of Sound waves = pressure waves through a medium = compression (more molecules per cubic inch) and rarefaction (fewer molecules per cubic inch) of air. A vibrating object sets the waves in motion, your ear decodes them. Sound also travels through other media, like water and metal. No sound in a vacuum, because there’s nothing to carry it.
Speed of sound in air: about 1100 feet per second. That’s why you count seconds after a lightning strike to see how far the lightning is: 5 seconds = one mile. Conversely, 1 millsecond = about 1 foot.
Sound travels a little faster in warmer air, about 0.1% per degree F, and in a more solid medium: in water, 4000-5000+ fps, in metal, 9500-16000 fps.
When we turn sound into electricity, the electrical waveform represents the pressure wave in the form of alternating current. The electrical waveform is therefore an analog of the sound wave, Electricity travels at close to the speed of light, much faster than sound, so transmission of audio in electrical form is instantaneous.
Characteristics of a sound:
Frequency = pitch http://www.psbspeakers.com/Images/Audiotopics/fChart.gif.
How many vibrations or changes in pressure per second.
Expressed in cycles per second, or Hertz (Hz).
The mathematical basis of the musical scale: go up an octave = 2x the frequency.
Each half-step is the twelfth root of 2 higher than the one below it. = approx. 1.063
The limits of human hearing = approximately 20 Hz to 20,000 Hz or 20 k(ilo)Hz.
Fundamentals vs. harmonics = fundamental pitch is predominant pitch, harmonics are multiples (sometimes not exactly even) of the fundamental, that give the sound character, or timbre.
Period = 1/frequency
Wavelength = velocity of sound in units per second/frequency
Loudness (volume, amplitude) = How much air is displaced by the pressure wave. Measured in decibels (dB) above threshold of audibility (look at chart). The decibel is actually a ratio, not an absolute, and when you use it to state an absolute value, you need a reference. “dB SPL” (as in chart in course pack) is also referenced to the perception threshold of human hearing. Obviously subjective, so set at 0.0002 dyne/cm2, or 0.00002 Newtons/m2. That is called 0 dB SPL. By contrast, atmospheric pressure is 100,000 Newtons/m2
dB often used to denote a change in level. A minimum perceptible change in loudness (Just Noticeable Difference) is about 1 dB. Something we hear as being twice as loud is about 10 dB louder. So we talk about “3 dB higher level on the drums” in a mix, or a “96 dB signal-to noise-ratio” as being the difference between the highest volume a system is capable of and the residual noise it generates.
“dBV” is referenced to something, so it is an absolute measurement. “0 dBV” means a signal referenced to a specific electrical voltage in a wire, which is 1 volt. “0 dBu” is referenced to 0.775 volts, but it also specifies an impedance of 600 ohms. We’ll deal with impedance later. Common signal levels in audio are referenced to that: -10 dBV (consumer gear), +4 dBu (pro gear)
The threshold of pain is about 130 dB SPL, so the total volume or “dynamic” range of human hearing is about 130 dB.
Characteristics of the ear as transducer.
Ear converts sound waves to nerve impulses.
Each hair or cilium responds to a certain frequency, like a tuning fork. Frequencies in between get interpolated. As we get older, hairs stiffen, break off, and high-frequency sensitivity goes down. Also can be broken by prolonged or repeated exposure to loud sound.
click here for Prof. Azevedo's ES 65 notes
click here for assignments