removing an extra column in vim

So I load a csv in vim, and realize I screwed up. I wrote out 2 probabilities instead of one.

 2 100004,0.732524934269992,0.267475065730008
3 100015,0.502454568719362,0.497545431280638
4 100026,0.426691576440626,0.573308423559374
5 100030,0.796236053493448,0.203763946506552
6 100047,0.745809954575253,0.254190045424747
7 100052,0.980561973536192,0.0194380264638075
8 100056,0.86757763360163,0.13242236639837
9 100061,0.580641527965135,0.419358472034865
10 100067,0.487600362330103,0.512399637669897
11 100068,0.336393531503461,0.663606468496539
12 100082,0.582289239857109,0.417710760142891
13 100084,0.234077152161437,0.765922847838563
14 100097,0.252967375901838,0.747032624098162
15 100098,0.121602233424238,0.878397766575762

In vim, to get rid of that extra column, you just type:


This says: “from the first line to the last line, swap anything that’s a comma followed by a bunch of not-commas at the end of the line, for nothing. So it just gets rid of the last column. (This will not work for fancy CSVs with quoted strings with embedded commas as the last field. So sad.

Posted in tech | Tagged , , , | 1 Comment

Why I like Domino for my Cloud Data Science

I am writing a longer blog post about using my Toshiba Chromebook as the primary computer in the Coursera John’s Hopkins Data Science Specialization, but I really felt this was a valuable nugget of information to get on the web.

Using my $10/month Digital Ocean droplet to train an R randomForest model using caret:

real    3m31.409s
user    3m28.461s
sys     0m1.875s

Pretty good for $10/month! However, I’m a big fan of Domino Data Labs for cloud based data science. They provide access to super fast servers in the cloud. They also are the first folks to package “data science best practices” into a product. The system snapshots all of your analytic runs, along with the program and the data, in one consistent view. Makes it nice and easy to go back, see what changed, collaborate, and most importantly, it aids in making your research reproducible. But, here’s why I really use them:

real    1m42.314s
user    0m4.257s
sys     0m0.251s

The same exact code, run on Domino’s servers, ran in roughly half the time. This is using their absolute cheapest instance! This run was billed at $0.0093 cents a minute. Seriously, less than 1 penny a minute to have access to a computer three times as fast as my Digital Ocean droplet, with integrated version control, with built in collaboration tools. Even crazier, of that 1m42s time, they only charge me for 1m12s, because the rest of the time was spent transferring my code and data.

I paid 1 freaking cent for the privilege of running my code on fast hardware on the cloud. Check them out!

Note: After being a client of Domino’s and pushing their systems and processes to the limits, they asked me to help advise them. I am not a completely unbiased source, but I came to this position pretty darned honestly 🙂 Ask me any questions!


Posted in tech | Tagged , , , , | Leave a comment

Unpopular Opinion

One of the goals in moving from Brentwood, TN to Santa Monica, CA was broadening the opportunity for entertainment that was available to us. We live right down the road from a fantastic comedy venue, the West Side Comedy Theater. It’s a spectacular venue in a weird little alley behind the 3rd Street Promenade. In particular, I’ve been going to see a show every 1st and 3rd Tuesday of the month, the Unpopular Opinion show put on by the talented duo of Adam Tod Brown and Ben Blanchard. This show has always made me laugh, but it turns out getting a predictable audience on a Tuesday night is *hard*.

I’ve been at the show where it was completely sold out and the bar was packed. I’ve been at the show where there were six people in the audience. We were outnumbered by comedians. I’ve met friends who are steadfast supporters of live comedy, and I’ve made friendships with some of the performers. But more than anything I’ve been entertained, for the bargain basement price of $5, with no drink minimum. I’ve seen fantastic comics representing almost every way to be a comic at this theater.

Tonight, however, was a special treat. Tonight, the audience, was spectacularly weird. Somehow, through some bizarre confluence of events, tonight the audience was almost entirely random disconnected tourists. Some from Sweden, some from Brussels, one from Perth, and me… a Spaniard raised in the deep south. Tonight, the audience took in the jokes, and smiled, and enjoyed it, but was fabulously fantastically weird. The comics all had to dig deep and find new ways to ply their craft. Nothing worked as it should, and yet it came together into a beautiful 1.5 hours of comedy.

I watched Ryan Clauson and Julian McCullough perform fantastic, winning sets. Jokes that should have killed. But jokes that required a cultural context. I laughed at jokes ranging from relatable guy stuff all the way to steroids. These were great jokes, but the kind of jokes that need an audience who has shared even an iota of the American experience. Tonight was simply not going to be that kind of night.

I saw Beth Stelling win over the audience through absolute force of character, and with a superb well oiled delivery. She owned the audience, and transported all of us to her Ohio shirted reality. I saw Emily Maya Mills playfully lead the audience and keep us on our toes, taking us from her San Francisco roots to her garden pioneer aesthetic. Annie Lederman called me out for my terrible tattoos… relics of an era where I thought I could buy a personality, and she was hilarious and yet gentle. She could have crushed me when should found out I didn’t remember what the Chinese characters stood for, instead she made it light and airy and funny and kind. These were funny, masterful performances, which through force of performance took this foreign audience into their own reality, and make them laugh.

And then there was Damon Wayans. For the second time in my life, I had the fortune of sitting front row to Damon Wayans playing a small room. Mr. Wayans is clearly at another level in the mastery of his craft. He joked effortlessly about topics ranging from airline accidents to slavery and racism in front of this all european, all weird, 100% not right audience… and he succeeded in connecting with them. From my vantage point, it was a fabulous thing to watch.

The show closed, and I was privileged that the cozy, intimate nature of the venue was kind to me. I had the genuine pleasure, as a fan, of meeting and conversing with some of the comics themselves. I got to tell them the story of how I met my beautiful wife, I got to tell them how much I appreciate their comedy, and I got to shake their hands. This was a big awesome deal to me.

It must take a different kind of person to be a comedian. To walk into a room full of random european hostel residents, a dude from perth, a redneck spaniard, and decide “even though we don’t know each other, get ready ’cause you’re going to laugh.” I don’t understand it, but I appreciate it. Here’s what I don’t understand though… I got all of this tonight, and have gotten it on the 1st and 3rd Tuesday of every month for the last year (along with my buddy @siouxhavens) and yet there were empty seats in the audience, not one, not two, but like half. Maybe not half, but close. Residents of Santa Monica, I was home by 10:30 pm! It was 5 dollars and a 3 dollar uber ride. LA, I hope some day I am unable to go to this show because it’s sold out, because y’all are missing out.

Posted in tech | Tagged , | Leave a comment

One week in review of the Toshiba CB35-A3120

I wish I had more depth to this review. I wish I could give you point by point instructions about what’s great about it and what’s not so great about it. I wish I had detailed pictures to show you. Instead, I have a very short simple point.

Given a choice, for non-work tasks, I now reach for my chromebook over my macbook pro.

I’m going to let that sink in for a second. I have a new macbook pro with an ssd and 16 gigabytes of ram. It’s super nice. I run whatever software I want on it and have it configured exactly like I want to have it configured. So why do I reach for the chromebook? Because it’s easy. Because when I open it, it’s ready to go immediately. Because I never have to worry about having too much stuff running on it and it getting slow. Because I know for a fact it will still have a bunch of battery life. Because it’s solid. Because it feels well built.

I don’t know what else to say, if you need a computer that lets you internet and ssh, you really couldn’t do better.

Posted in tech | Tagged , , | Leave a comment

First day review of Toshiba CB35-A3120 13.3-Inch Chromebook


No seriously. Guys.

This Toshiba CB35 is a really nice laptop. I wasn’t expecting it. I was expecting an OK laptop. But it isn’t an OK laptop. It’s a really nice laptop. If you read this blog or know me you know that I’m super duper hard on computers. I use every single stinking core, I max out the ram, I make computers hurt. I bought this laptop because I wanted an easy disposable computer for random hacking. Well, golly, it’s won me over.

So, I received it yesterday. What’s the first thing I did?

  1. Boot it into developer mode
  2. Install ubuntu on it
  3. Have a linux laptop

It was incredible. It was a solid linux laptop. But here’s the crazy part. You know what I did immediately after doing that?

  1. Restore it back to base chromium

Why?!?! Why would I do such a thing? Because I can always hit Ctrl-Alt-T and get a terminal. Turns out, all I really want out of a laptop is a super fast web browser and the ability to ssh to computers where I do real work. I don’t want to keep my code or my data on my laptop, no matter how nice it is. Why? Because laptops break. Because laptops get stolen. Because laptops.

I am typing this blog post on this laptop right now. It’s fast. It feels solid. It’s got a delightful keyboard. It’s got a great screen. It was $279 freaking dollars. For 300 bucks delivered next day to my house I have a solid laptop, running a UNIX OS, with a webcam, with a real solid feel, that I can use to hit RStudio server on my development machine. I can then ssh to the linux server and type “domino run foo.R” to have that same analytics code run in the cloud. I am not limited.

Everything that is old is new again. This is a vt100 for 2014. I do all my serious work on servers managed by true deep wizards, and I have my own little disposable fun device.

If you are at all on the fence, buy one. If you are a programmer who has access to a shell account on a real server somewhere, buy one. If you want a computer you can carry without fear of getting it stolen or having it walk away? Buy one. If you want a laptop with 9 freaking hours of battery life? Buy one.

I can’t say enough nice things day 1 about this Chromebook. Good job, good effort.

Posted in tech | Tagged , , , , , | 3 Comments

Stop using regular top.

Seriously, I know you’re old like I am. I get that, but it turns out there’s been ridiculous advancements in top technology, there’s two programs I just learned about today that make me feel like I’ve clearly missed out on some advancements on the command line.


Atop is an ASCII full-screen performance monitor that is capable of reporting the activity of all processes (even if processes have finished during the interval), daily logging of system and process activity for long-term analysis, highlighting overloaded system resources by using colors, etc. At regular intervals, it shows system-level activity related to the CPU, memory, swap, disks and network layers, and for every process (and thread) it shows a.o. the CPU utilization, memory growth, disk utilization, priority, username, state, and exit code.

Oh, and by the way, it’s freaking beautiful:


However, if what you want is more bars and widgets and viewable doohickes, I must admit, my heart lies with


This is htop, an interactive process viewer for Linux. It is a text-mode application (for console or X terminals) and requires ncurses.


But that doesn’t really do it justice. Seriously, check it out, it freaking provides real time visual bars of every single resource like CPU and ram so at a glance you can see exactly how taxed out your freaking computer is. My god, things have gotten really quite advanced since freaking truss and top, haven’t they? Have I missed any “must have” terminal utilities from the last decade, wherein I was clearly sitting around with my head up my ass?

Posted in tech | Tagged , , , , | Leave a comment

Speeding up R code

So I’ve been writing a bunch of R using the fantastic RTextTools and topicmodels packages for LDA. The math in those packages is fantastic, the string manipulation in R is not great. So, what’s a guy to do, but ask a stackoverflow question regarding how to do the task at hand.

The task is:

Given a string input, and two vectors (start and end), such that start[i] and end[i] contain a pair of offsets to *remove* from input (thereby shortening the length of the string.)

In Perl, the answer is just “loop, call substr, and go on to the next problem.” In R, it’s not so easy, we ended up with:

s <- "some source text…"
cutpoints <-data.frame(start=text$start, end=text$end)
keeps <- data.frame(start=c(1, cutpoints$end+1), end=c(cutpoints$start1, nchar(s)))
pieces <- apply(keeps, 1, function(x) substr(s, x[1], x[2]))
sliced_string <- paste(pieces, collapse="")

view raw


hosted with ❤ by GitHub

Which actually has to take the slices you want, instead compute the parts you want to keep, and then iterates through the list. It’s not a bad implementation, but it’s incredibly slow and memory inefficient (it’s keeping N copies of the document in memories, where N is the total number of slice pairs.) After talking to my R guru, I learned about the fantastic Rcpp package. That same task could be coded in “modern” C++ (using std) like:

#include <Rcpp.h>
#include <iostream>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
std::string string_slicer( std::string input, std::vector< int > start, std::vector< int > end) {
std::string working_copy = input;
for(std::vector<int>::size_type i = start.size() – 1; i != (std::vector<int>::size_type) –1; i–) {
working_copy.erase(start[i]-1, (end[i] – start[i])+1);.
return working_copy;

Which can then be called in R like:

sliced_string <- string_slicer(s, cutpoints$start, cutpoints$end)

view raw


hosted with ❤ by GitHub

The output is identical, and the performance in the Rcpp version is literally 25x better. Here are two graphs showing the different performance profiles*. The nativeR version, the call stack is *12* layers deep of R code. 10,000 iterations of the R version took over 40 seconds to run.


By comparison, the C++ version, for the same 10,000 iterations it only took 1.5 seconds! Mind you, the profiler hides the actual C++ call graph, lord only knows what is happening underneath the hood that is std and the stl. However, it’s super fast, and I don’t really have to care too much.


As a caveat, there *was* difficulty in this original task, as the string encoding was not passed real cleanly between R and C++, the length() values that were returned in R and in C++ were completely differently, but that can be chalked up to nothing ever working right in unicode ever. I ended up “fixing” it (for my purposes, I would never suggest that this is the actual way to handle this by following this stackoverflow answer.)

The good news is that this really saves a lot of runtime and memory on the document preparation component of my LDA task, and it has opened up my eyes to powerful ways of dealing with performance critical R code. For my next trick, I can’t wait to tie Dan Bloomberg’s fantastic Leptonica library. Being able to do morphological operations on an image in C and have the resultant image directly exported into R as a numeric matrix is going to make a whole bunch of tasks possible that were previously simply computationally unfeasible.

* As an aside, I don’t think I’ve ever learned more about a programming language and environment than when learning about how to do performance profiling. I don’t know if that’s just me though. Those cheesy graph’s were made with Hadley Wickham‘s fantastic profr package.

Posted in tech | Tagged , , , , , | Leave a comment

Current Plan for Twitter

These are the new rules:

  1. At the beginning of every month, my list of twitter followers will be culled based on:
    1. If you are a lifelong friend who is active on twitter, you get a pass (active on twitter is the key.) This will be a manual list. It doesn’t mean I don’t love you if you’re not on this list, you may simply not be active enough on twitter.
    2. If you’re one of the top 20 people I’ve replied to
    3. If you’re one of the top 20 people I’ve retweeted
  2. I can add followers throughout the month, however I want, but at the beginning of every month, step #1.
Posted in lifehacks | Tagged | 1 Comment

3 Strikes. How I went from early adopter to disinterested in Amazon Fresh

I have moved to Santa Monica in the last year. This has afforded me all kinds of awesome opportunities to really start eating clean. I can go to my friendly local co-op and get grass fed organic free range potato chips if I want 🙂 But still, the hustle and bustle of the big city makes it so that I don’t always have the time I need to go do some shopping. I was ecstatic when LA (and therefore SaMo) were introduced as Amazon Fresh test markets! That excitement lasted through 3 relatively miserable experiences that reminded me of why webvan failed, and why sadly, Amazon is probably going to do the same in this market.

Episode One – A New Client

One of the things that has happened recently is that I’ve become “mostly” vegetarian (I like the term Flexitarian. It allows me breakfast bacon, which I count as a sacrement more than a food.) We have gotten some pretty swell vegetarian cookbooks, and so I decided to test out Amazon Fresh with an order for all the stuff I needed for the Palak Paneer recipe out of The Indian Slow Cooker.


Palak Paneer looks like it would just be a bunch of ground up spinach, but it’s a shocking ammount of vegetables! Look at what it looks like before it’s been slow cookered for a jillion hours:


There’s onion and tomato and pepper and all kinds of nonsense up in that recipe. Spices, garlic, the works. So, that’s what I ordered from Amazon Fresh, excited to have all of my ingredients show up first thing in the morning, so that I could effortlessly cut them, put them in the slow cooker, and enjoy the convenience of having my organic groceries delivered to my doorstep. The next morning, when I woke up and checked my email, I received the following text in a friendly email from Amazon:

Screen Shot 2013-07-10 at 2.10.31 PMPretty helpful right? They delivered 1 out of the 4… uh… 99 cent things? And 0 out of the 3… uh… 0 cent things? It turns out that for this famous spinach dish I require, among other things, spinach. They had delivered 1/4 of the spinach required (I never figured out the other component.) What does this mean? Instead of getting to simply put it in the slow cooker and enjoy my day, these ingredients were utterly and completely useless to me. I had to get out of my house, go to the grocery store, and pick up 3 bunches of spinach.  I’ll let you in on a little secret, if you can’t deliver all of the components of a recipe, you are of no use to me. If I have to go pick up one of the things I ordered from a grocery store because you didn’t deliver it, I may as well pick them all up.

It is important to note, they didn’t tell me that they would be out of spinach during the ordering process, or even a few hours after ordering the groceries. I found out the next morning (probably after they had gone to pack my order and had discovered that they didn’t have anything.) So I went to the grocery store and felt a little bitter.

Episode Two – A Jaded Customer

So, I decide I’m going to make a lovely Vegetarian Chili! This should be a pretty easy thing for Amazon fresh to deliver to me, after all, chili is a staple filled food. It’s beans and peppers and onions. So, we do the same exact dance, I go on the website, put everything in my cart that I’m going to order for the recipe, order it and go on my happy way. I’m going to guess that you can guess how the experience goes… I go to sleep, wake up in the morning to see my Amazon Fresh Delivery Notification. It provides me no joy:

Screen Shot 2013-07-23 at 10.14.02 AMThe dreaded “Unable to Fulfill” message. My Chili recipe required 4 peppers, 2 Green Peppers and 2 red peppers. You know, for color! Well, once again, and without giving me the option to say: “you know what, just go ahead and ship 2 extra green peppers, I’ll deal with it” or absolutely any options, I received 9/10ths of a recipe’s worth of food. Once again, I had to take time out of my schedule to go to a store to buy ingredients for a recipe. At this point, Amazon Fresh was costing me 2x the time to order from, shopping on their website was not a painless experience, and on top of that, I had to do the shopping twice. I was unhappy, but willing to forgive, real time inventory is sometimes hard to do, and I’d rather get no red peppers than some really crappy red peppers, or so I figured.

Episode Three, In Which I tell Amazon Fresh “thanks but no thanks.”

This is the one that really sealed the deal. I decided I was going to make some vegetarian lasagna in the crockpot. Pretty exciting stuff. I had a real busy day, as I was going to be in all day meetings with a German Roboticist to discuss stuff that actually mattered, so getting to set up the meal before teh day started, and getting to “forget it” was splendid. I woke up that morning and was ecstatic to see that every single item had shipped from Amazon! Finally I was going to have a good experience.  In my order, there were eggs and spinach… Well, guess how it showed up:

photo 1 (1)Of my dozen eggs, most of them were crushed. Simply crushed. They also were not inside of a little plastic bag, they had clearly been simply jammed into the Amazon Fresh container with no regard to their ability to survive the transport. This means that they also leaked all into my spinach:

photo 2 (1)My Spinach was now full of crushed up egg goo. I wasn’t going to use all the spinach for the recipe, i was actually pretty happy at the idea of getting to use the rest of the spinach to make some lovely squash blossom quesadillas or something like that. You, unfortunately, can’t keep spinach that’s covered in egg goo in the fridge, that’s just not safe. So, by overstuffing the bag, and rushing the shipment, Amazon Fresh had delivered everything I had asked for in spirit, but not in fact. It was clear now that any time I may save from ordering from Amazon Fresh would be spent dealing with the issues of having to deal with Amazon’s logistics.


I am very fortunate to live in a walkers paradise. I am less than 1/2 mile from multiple grocery stores, ranging from a Von’s to a locally owned co-op. I had loved my Amazon Prime subscription so much, and the way it changed my shopping that I was willing to give Amazon Fresh a shot. It turns out, it’s less convenient, not any cheaper, and I end up having to shop twice. I wish them luck, but they’re going to have to do this without me.

Posted in lifehacks | Leave a comment

C# and Mono vs Perl. Oh my god, who am I.

So I’ve spent the last few days diving deep into C#/.net/Mono. When I first looked at it a few years ago, I saw a big clunky language and an unwieldy runtime. Having just spent the last 6 months looking at perl/Moose I have to admit, I’m a bit embarrassed. Seriously y’all, what the hell happened.

I find myself enjoying C# *more* than perl. I want you to think about what I just said… From a pure pleasure standpoint, from a pure “getting things done” standpoint… C# is *smashing* perl. Killing it. Crushing it. I’m besides myself with grief, and I’m besides myself with excitement.

I’ve spent years writing event driven code in perl. *years*. Writing state machines that control physical hardware, state machines which drive terabytes of content, state machines which drive entire clusters of machines to do image processing, and every time I reached for Perl. Sometimes I reached for POE, sometimes I reached for AnyEvent, sometimes I just threw some shit together, but I always reached for Perl.

I can’t believe I’m about to write this.

Event driven architectures are way easier to express, performant, easy to debug, and scalable in C#/.net/mono. And this is after only a week and a half of effort on my part.

I was not expecting this. I’m besides myself.

Oh, and don’t get me started in how much easier .Net code is to deploy. Don’t fucking get me started. Sure, perlbrew/local-lib have made life infinitely easier (that you @miyagawa), and it’s not *quite* as easy as Go‘s model, but it’s comically easier. Comically.

Posted in tech | Tagged , , , | 3 Comments