Smartphone Sensor Data on Planes

In case you haven’t heard, the FAA recently announced that airline flyers can now use phones/tablets/devices during landing and takeoff in addition to during the flight when the seatbelt sign is off. Full press release details available here.

This is good news for Robotic sensor hacker enthusiasts (like me) who have been scheming to use collected sensor data during this flying interval for some time. It also opens up the possibilities of developing new applications that leverage this new allowed capability to ensure a better flying experience. Maybe some of the functionality of the traditional black box can be crowd-sourced to help ensure flight safety (and secondary data in case of possible hardware failure). Maybe sensor data could be used as recourse by passengers against a pilot who provides a turbulent ride full of crazy maneuvers (sensor agreement between users could provide a joint class-action case!). The application possibilities are wide open.

I hacked up a quick Android application about a year ago to log sensor data (barometer, accelerometer, magnetometer, etc). While there are a variety of apps that do indeed showcase sensor data, it was challenging to find one that logged reliably w/ the fine-control over sensors and activity context switching. So if anyone wants the Android application optimized for data collection of sensor data, let me know.

For the past year, I have logged data on nearly every flight I’ve been on with my Android Galaxy S3. Here are some hacks of the sensor data for a particular flight from Pittsburgh to Boston this past summer.

Barometric Analysis

The Android barometer reports pressure in millibars. However, there is a simple conversion formula available for converting millibar pressure to expected altitude here.

The matlab code is:

function altitude = convToAltitude(mbar_readings)
%mbar to ft
altitude = (1 – (mbar_readings ./ 1013.25) .^(0.190284)) .* 145366.45;
%ft to m
altitude = 0.3048 * altitude;

Technically, you can get a better model of altitude taking into account other factors such as ambient temperature and weather conditions. For instance, air pressure is very dependent on temperature. The S4 actually contains additional thermometer and humidity sensors that might actually lead to a better inference of altitude. However, we will ignore such complexities here and use the simpler model that is just a function of atmospheric pressure.

rawBaro altitude

The top plot shows the raw barometer pressure reading in millibars, while the bottom plot converts the sensor reading time series to altitude using the above formula.

The first thing to notice is that, after sensor calibration, the estimation of altitude is not far away off from the ground truth. Here is the table of comparisons with ground truth taken from Google Maps.

Airport Estimated Altitude (m) Google Maps Ground Truth (m) Error (m)
Pittsburgh International Airport 378.152 m 367 m 11.15 m
Boston’s Logan International Airport 4.428 m 5.80 m 1.17 m

The second thing to notice is that the cabin pressurization limit is well-shown by the plot. Generally, the cabin pressurizes (source: Wikipedia on “Cabin Pressurization”) at 2400 m above sea level. We see that estimated altitude by the in-plane barometer increases during ascent, flat lines around that range, and then decreases during descent.

The fact that cabin pressurizes at a high enough altitude means that altitude of the plane cannot be estimated well at all times by a barometer inside the by plane. However, when the plane is taking off or landing, the cabin isn’t as pressurized, and altitude can be estimated well. Since plane troubles are most likely to happen during ascent or descent, this is perhaps the most interesting time to log and crowd-source barometer data collection.

Accelerometer Analysis

Here are the x,y,z axes of accelerometer data collected during the Pittsburgh-Boston flight.

xaxis yaxiszaxis

The accelerometer can be used, to a first approximation, to identify periods of turbulence during flight. A straightforward way to do this is to use a k-sigma filter to find the data points k-sigma away from the mean in each dimension. This is a really easy way of finding global outliers as shown below.


This allows us to find the anomalous jerky behavior in the data from the normal noisy/oscillatory behavior. A more advanced way to process this data would be to filter out systematic oscillatory behavior explicitly in the frequency domain using FFT and band-pass filtering (subject of future work). Certainly if there is drift in the sensor, more advanced techniques need to be used.

Obviously, accelerometer data on phones is extremely noisy and my erratic fidgeting and moving around the cabin to go to the bathroom probably didn’t help things much. However, it is still neat to see systematic periods of anomalous jitter in the data. Given ground truth for actual air turbulence, it would be neat to train a supervised classifier to delineate between case of actual air turbulence vs. other instances of noise. There is certainly a lot of mountain of previous work identifying human behavior and activity recognition from motion sensors in homes, urban areas, etc that could be useful towards this new frontier of on-plane behavior.


This post hopefully shows you the promise of using cell phone sensors on planes. This is just the tip of the iceberg of what is possible with AI/Robotics today. This is very low hanging fruit as Intelligent Robotics goes, and there is certainly much more in the pipeline being developed. Stay tuned!

Home Aquarium Monitor

Home Aquarium Monitor

I have a bunch of web cams around so decided to try something “useful” with them, while shamelessly playing around with some visual processing techniques in the process. I’ve been trying to build a monitoring system for my home aquarium. It’s doing pretty decently for a first take.

This is a four web cam set up around a 1.5 gallon aquarium. Two cameras are on the top of the aquarium and two are mounted on the side. The four rows on the screen show the output of the different cameras as well as some visually processed results. The screen shots from left to right are:

(1) The raw output of the webcam.
(2) Basic motion detection in the scene. Pretty much does a diff between the current image and last two images.
(3) Edge Detection – looks for differences in color in neighboring pixels.

Alas, some of the cameras are doing a lot better than others at isolating the fish. Watch camera 2 and the motion detection screen for it and you can see its doing a pretty decent job of isolating the fish in the scene.

I hope to eventually not only improve the the fish motion tracking (perhaps using more temporal and the edge data) but get the cameras to collaborate to figure out where each of the fish are likely to be in the tank for each time step.

Intelligent Guitar Software Suite Now Available

My GuitarSuite project is now open source, free for anyone to use and develop!


The current build supports riff extraction and melodic classification. You can import guitar riffs from a Guitar Pro (*.gp4) tab file, build a database of melodies to recognize, and then have them recognized in real time as you perform them on your guitar.

—-Hardware Requirements—-

You must have electric guitar connected to computer via some external device that allows recording of the guitar’s sound. This could be a mixer, effects pedal, or other device that interfaces with your computer.

If you don’t have an electric guitar, you could always hum. You better hum well though!

For interfacing my guitar, I use a Digitech RP150.

I plug my guitar into the pedal using line in and connect the Digitech pedal to my computer with usb. The Digitech pedal replaces my laptop’s microphone as the recording device.

—-System Requirements—-

-Java 1.5 or higher
-Matlab 2008 or higher
-Matlab Signal Processing Toolkit (comes with Matlab 2008 or higher)

To download from svn, run command:
svn checkout GuitarSuite

Matlab Setup:

1. Open Matlab
2. Do File –> Set Path –> Add with Subfolders
3. Go to your GuitarSuite/drivers folder and add both “kts-matlab-midi-d0b8bfd” and “MATLAB-Chroma-Toolbox” to your matlab path, WITH SUBFOLDERS.

To Build:

The project comes with build.xml, an ant build file. You can use your favorite IDE to build it. I recommend eclipse.

To Run:

The build file contains two runnable targets:
DBLoader.gui and DBRecognizer.gui.

DBLoader.gui brings up the user interface to build your melody database. You can import different *.gp4 [NOTE: ONLY *.gp4 is current supported!] files into the program. The program will extract the riffs from these files. You can then select which of these riffs to add to your melody database, the database of melodies that can be recognized.

DBRecognizer.gui brings up the recognizer user interface. The recognizer should load up matlab at startup.

Once matlab starts up, start performing!

The recognizer will record clips of audio from your guitar every 10 seconds. Following each 10 second record, the recognizer will attempt to label what you’ve just played. It will utilize the database of melodies you created and try to figure out the best match to what you’ve played.

Have fun!

Melody Recognizer v1.0

Update: All code for this article is free and open source. It can be downloaded at

Melody Recognition Software – Towards An Advanced Improv System

Imagine a computational music system that allows a human guitar player to jam with their computer. As the guitar player plays, the software would generate accompaniment for their playing. The computer agent and guitar player’s produced sounds could be piped to a central speaker which would allow the overall “jam session” to be heard.

There are several technical sub problems to be solved in constructing such a software system. The system must accomplish both Melody Recognition (figuring out what the human player is playing) and Accompaniment Generation (determining what music must be generated to accompany the human player). There is a lot of complexity to this as each subproblem can be broken down into further subproblems. Many of these sub-subproblems have their own research community, literature base, and associated conference.

The creation of a Melodic Classification component itself is a fascinating endeavor. A melodic classification component would take as input a recording of human guitar player playing a melody and a known library of melodies. It would output a label for that melody that is the “closest match” to one of the melodies in the database. The Melodic classification problem, then, is to determine what melody a person has played from a known library of melodies. The melodic classification problem is the first major functionality I hope to provide with my system.

An associated problem with melodic classification is the population of the melody database to facilitate classification with. There are two key feasible ways to populate a melody database – either a human could prerecord melodies for the database or the database could be generated from some other source of musical data.

One dominant data source is guitar tablature.  For those inexperienced with the magic of guitar tabs, guitar tablature is a common technique for writing down guitar notes. It’s kind of the sheet music equivalent for guitar that people who play guitar like to use. There are giant databases of guitar tabs online at sites such as or At these sites, users create their own tabs and post them up. One can  find tabs for free for virtually any major guitar composition. Guitar pro is a popular software that can be used to write tabs. Their format *.GTP is a popular format for tabs.The second key functionality that I hope to demonstrate is the automated discovery and extraction of key melodies in a guitar pro tab file.

I. Melody Classification Approach

Let’s start with the solution to problem 1 – recognizing a melody that a guitar player has played.

A melody, fundamentally, is a sequence of notes and rests. Notes and rests have variable durations. A simplfying assumption is that the melody is monophonic (composed of single notes rather than chords). This isn’t a gigantic assumption as many melodies are monophonic. Polyphonic music is much harder to process starting off, but even so, I believe my system would work for polyphonic music even if polyphonic support isn’t built in.

My approach is to record an audio clip, use various signal processing techniques to extract the sequence of played notes in the audio-clip, and then applying pattern matching algorithms to compare with melodies in the melody database.

A. Signal Processing Methods

The base level components are four Note Transcribers – components that take in a recording of a sound and extract the fundamental pitches or notes from the sound files. My implementation consists of four transcribers based on four different signal processing techniques.

1. Simple FFT Transcriber with band-pass filtering – The idea is to take the FFT over the whole audio file and apply band pass filtering on the signal, thus allowing only certain frequencies to pass. While the algorithm does tend to extract the correct fundamental notes, there is no time information in this approach. Thus while the approach does spit out notes, it provides no information to when those notes are played.

2. Spectrogram Peaks with band-pass filtering – Spectrograms utilize the the short-time-fourier-transform (STFT) algorithm to generate a time-frequency graph. Taking the peaks of these gives one the most strongly concentrated frequencies per unit time. These key frequencies are then mapped onto notes. This is a technique improvement over #1 as now time information for pitches is provided. Unfortunately, it turns out that spectrogram peaks are not always the fundamental frequencies of a note. Thus, while spectrogram frequencies might give one the general vicinity of a note, they aren’t that precise with octave information.

3. Pitch-Tracking F0 Estimation with Note Model – Notes played by instruments don’t necessarily have a constant frequency. Rather, notes tend to follow a “lump” shape where they start off with a low frequency, come up to the frequency they stay at (generally the frequency of the note), and then go down again once the note is finished.  There are various signal processing algorithms that can track the pitch of the note, having such a model of a note. I’ve found an autocorrelation-based algorithm works best. An autocorrelation algorithm compares parts of a signal to itself to extract frequencies in the time domain. This algorithm gives both octave and fundamental note information.

4. Chroma Signal Processing – The idea behind chroma processing is to map pitches onto fundamentals. Doing this creates algorithms that are highly robust to noise and distortion. Chroma-based methods underly most commercial music information retrieval systems. However, chroma based methods map to fundamental notes of which there are only 12 bins. Thus, while this algorithm works extremely well to determine what note was played, it provides no octave information.

Each of the four algorithms has pros and cons. How can one get the best of all the worlds? This can be cast as a sensor fusion problem!

Sensor Fusion is the combining of sensory data or data derived from sensory data from disparate sources such that the resulting information is in some sense better than would be possible when these sources were used individually.

The sensor fusion algorithm is a current work in progress but the idea is to weight the output of each transcriber with respect to its benefits. Thus, the chroma-based transcriber is really good at extract notes effectively. However, it has no octave information. The chroma-based transcriber could identify the note and my autocorrelation f0 estimator provides the octave.

The output of the sensor fusion algorithm is a transcription of the notes present in the audio clip.

B. Melody Classification Methods

Melody Classification is the problem of identifying a melody based on a known library of melodies. A melody, once again, is a sequence of notes and rests.

Humans can recognize key riffs in a piece even if the musician plays them a little differently. Melody notes can have slight perturbations. Notes can have different lengths as can the rests.  Furthermore, humans can identify melodies in a scale-invariant sense. The same melody can be played in the key of G as in the key of C and a human could generally classify it correctly.

My melodic classification algorithm hopes to classify a newly-played melody given these parameters. For accomplishing this task, a Distance-Based classifier approach has been useful. Two different distance metrics – Levenshtein Distance and Melodic Transposition Distance have been especially useful.

1. Levenshtein distance (

A Levenshtein distance (let’s call it L-distance for short) algorithm takes as input two strings S1 and S2 and computes the minimum number of operations necessary to convert one string to another based on only operations to insert a character, delete a character, and substituting a character. These is an efficient dynamic programming algorithm to solve this problem.

For example, the L-distance between “ABC” and “ABD” would be 1 because it takes one substitution to convert “C” to “D.” The L-distance between “ABCD” and “ABE” is 2 because it requires the deletion of a character (either C or E) and substitution of the remaining character to E.

L-distance makes sense intuitively and performs well. It is intuitive.Two different melodies would tend to have high Levenshtein distances while similar melodies would have low distances. However, the L-distance isn’t very musical. For example, “ABC” and “ABF” would have the same L-distance. However, there is a huge auditory difference between the “B-C” interval and the “B-F” interval.  One is a musical second whereas the other is a musical fifth.

The second metric I’ve been experimenting with is the transposition distance between melodies. The transposition distance between two notes N1 and N2 is the number of musical half steps between them. The transposition distance between two melodies is the sum of such differences over all notes in the melody.

So far I’ve had mixed results with each metric and the performance is very melody dependent. I am working on a fusion technique for this as well. I’ve found L-distance works as a good baseline.

The above method tends to do well to ensure time and note invariance in melody recognition. For example, if the musician elongates a note, plays a different note here and there, and otherwise makes not-too-terrible mistakes, the classifier works successfully to apply the correct melody tag.

Extending the Distance classifier to accomodate pitch-invariance is surprisingly easy. Transposition formulas can be used to transpose a melody in one key to another key. The melodic database can simply be populated with all transpositions of a melody. Thus, the distance classifier can work to recognize a melody in any key!

II. Melody Discovery and Extraction

The Melody Discovery and Extraction problem is to recognize and segment out the key melodies or “riffs” in a guitar tab. Riffs tend to be the “catchy” parts of a song that people tend to like to hum. Riffs, depending on the song, tend be 3 to 15 note melodies (on average) that are repeated. A piece can be segmented into several note sequences that are the “riffs” in the piece.

I have been experimenting with two key approaches to melody extraction. The first is Rest-Based segmentation (segmenting based on the presence of rests) and Sequence-Based segmentation (segmentation based on note sequences).

A. Rest-Based Segmentation

Rest-Based segmentation involves breaking up a piece into note sequences based on rests in the piece. This is a rather simple metric but works surprisingly well for many pieces. It accords well to how many musicians write music. The underlying design concept is to play a certain theme, give the listener some time to process and reflect upon it, and then present the next musical idea.

The performance of the Rest-Based segmentation algorithm is hit and miss. If the piece is designed with the above design concept in mind, the algorithm works. Otherwise, it overshoots or undershoots the length of a riff in a piece.

An extension to improve this algorithm is to utilize note sequence content.

B. Sequence-Based Segmentation

Sequence-Based segmentation looks at the sequence of notes in a piece. The idea is that there are patterns in the underlying sequences of note that have been designed in by the musician. Sequences can be repeated several times for emphasis.

Indeed, much of rock or jazz music is well-structured, revolving around a couple key chord changes and phrasing on those chord changes. Musicians tend to describe a piece as having structure “AABA” if there is a chord of melody that is played three times and another chord or melody played once.

To extract, riffs, n-gram techniques can be used on the notes of melodies.

An n-gram is a list of the n consecutive elements from a source list or string. Thus, the 2-grams of “ABCD” are {“AB”, “BC”, “CD”}, the 3-grams are {“ABC”, “BCD”} and so forth.

The  3->15-grams of the notes of a melody can be extracted. That can be a LOT of n-grams depending on the piece. However, riffs tend to be repeated. Riffs can occur several times in a piece. Thus, high count n-grams should reflect some intended structure in the piece. We can thus figure out the key melodies in a piece by looking at the highest-count n-grams.

This method, unfortunately, isn’t without its problems. One problem that occurs is n-Grams overlap at different scales. Let’s say we have “ABCABCD”. A high ranking 2-gram is “AB” which appears twice. The high ranking 3-gram is “ABC” which occurs twice as well. However, is the riff “ABC” or “AB” globally?

I’ve seen this problem involving hierarchial tagging several times before in other computational areas requiring tagging of sequences of characters. It’s actually a fundamental problem in several areas. Natural Language Processing deals with the problem of extracting collocations, sequences of words that form a phrase that is more than the sum of its parts. For example a “hot dog” is neither “hot” nor a “dog.” DNA Sequencing deals with the problem of extracting key coding sequences.

After looking at some problem solving techniques in these areas, a possible heuristic came to mind: If a lower order n-gram occurs with the exact same frequency as a higher order n-gram, there is reason to believe that the higher order n-gram is the true design of the musician. For example, if “A” occurs 37 times and “BA” occurs 37 times, that means that every time an A appears, a B appears. It’s likely that the musician designed it that way because “BA” is their real unit of measurement. If “BACDEF” occurs 37 times too then maybe that’s the real underlying “riff” in the piece.

The results of this heuristic are fantastic. For some of the pieces I’ve tried I’ve seen the ngram count drop from 235234 to around 67.

III. Current Implementation and Future Plan

My system currently is completely software based. It is mostly written in java with matlab being used as a giant calculator for efficiently computing the signal processing math. There is a GUI component that allows a user to import tabs and add extracted melodies from that tab to their own customized melody database. The melody database file can then be read and used by a real-time recognizer component that records 5-10 second clips of a guitar playing and outputs melody labels for those clips based on the database.

The cool thing to do would be to build an actual hardware pedal using my recognition algorithm. For this implementation, the user could construct their melody database on their computer, use either usb or wifi to send the melody database to the pedal, and then use the pedal in a real-time performance system. The pedal could be used with a DJ or other accompaniment board to generate automated accompaniment based on recognized melody.

This could be quite an endeavor depending on the chosen hardware. One simple implementation could be with java Sunspots ( This would involve converting the matlab signal processing algorithms to java which isn’t too hard given many of their boxes are open source. I believe the algorithms are industrial strength so most of the work would be doing the embedded programming.

I shall have screenshots and a downloadable demo here soon.