Category: Computers

Motivation, Boredom, and Problem Solving

shatteredIn the New York Times Stone column, James Blachowicz of Loyola challenges the assumption that the scientific method is uniquely distinguishable from other ways of thinking and problem solving we regularly employ. In his example, he lays out how writing poetry involves some kind of alignment of words that conform to the requirements of the poem. Whether actively aware of the process or not, the poet is solving constraint satisfaction problems concerning formal requirements like meter and structure, linguistic problems like parts-of-speech and grammar, semantic problems concerning meaning, and pragmatic problems like referential extension and symbolism. Scientists do the same kinds of things in fitting a theory to data. And, in Blachowicz’s analysis, there is no special distinction between scientific method and other creative methods like the composition of poetry.

We can easily see how this extends to ideas like musical composition and, indeed, extends with even more constraints that range from formal through to possibly the neuropsychology of sound. I say “possibly” because there remains uncertainty on how much nurture versus nature is involved in the brain’s reaction to sounds and music.

In terms of a computational model of this creative process, if we presume that there is an objective function that governs possible fits to the given problem constraints, then we can clearly optimize towards a maximum fit. For many of the constraints there are, however, discrete parameterizations (which part of speech? which word?) that are not like curve fitting to scientific data. In fairness, discrete parameters occur there, too, especially in meta-analyses of broad theoretical possibilities (Quantum loop gravity vs. string theory? What will we tell the children?) The discrete parameterizations blow up the search space with their combinatorics, demonstrating on the one hand why we are so damned amazing, and on the other hand why a controlled randomization method like evolutionary epistemology’s blind search and selective retention gives us potential traction in the face of this curse of dimensionality. The blind search is likely weakened for active human engagement, though. Certainly the poet or the scientist would agree; they are using learned skills, maybe some intellectual talent of unknown origin, and experience on how to traverse the wells of improbability in finding the best fit for the problem. This certainly resembles pre-training in deep learning, though on a much more pervasive scale, including feedback from categorical model optimization into the generative basis model.

But does this extend outwards to other ways in which we form ideas? We certainly know that motivated reasoning is involved in key aspects of our belief formation, which plays strongly into how we solve these constraint problems. We tend to actively look for confirmations and avoid disconfirmations of fit. We positively bias recency of information, or repeated exposures, and tend to only reconsider in much slower cycles.

Also, as the constraints of certain problem domains become, in turn, extensions that can result in change—where there is a dynamic interplay between belief and success—the fixity of the search space itself is no longer guaranteed. Broad human goals like the search for meaning are an example of that. In come complex human factors, like how boredom correlates with motivation and ideological extremism (overview, here, journal article, here).

This latter data point concerning boredom crosses from mere bias that might preclude certain parts of a search space into motivation that focuses it, and that optimizes for novelty seeking and other behaviors.

Local Minima and Coatimundi

CoatimundiEven given the basic conundrum of how deep learning neural networks might cope with temporal presentations or linear sequences, there is another oddity to deep learning that only seems obvious in hindsight. One of the main enhancements to traditional artificial neural networks is a phase of supervised pre-training that forces each layer to try to create a generative model of the input pattern. The deep learning networks then learn a discriminant model after the initial pre-training is done, focusing on the error relative to classification versus simply recognizing the phrase or image per se.

Why this makes a difference has been the subject of some investigation. In general, there is an interplay between the smoothness of the error function and the ability of the optimization algorithms to cope with local minima. Visualize it this way: for any machine learning problem that needs to be solved, there are answers and better answers. Take visual classification. If the system (or you) gets shown an image of a coatimundi and a label that says coatimundi (heh, I’m running in New Mexico right now…), learning that image-label association involves adjusting weights assigned to different pixels in the presentation image down through multiple layers of the network that provide increasing abstractions about the features that define a coatimundi. And, importantly, that define a coatimundi versus all the other animals and non-animals.,

These weight choices define an error function that is the optimization target for the network as a whole, and this error function can have many local minima. That is, by enhancing the weights supporting a coati versus a dog or a raccoon, the algorithm inadvertently leans towards a non-optimal assignment for all of them by focusing instead on a balance between them that is predestined by the previous dog and raccoon classifications (or, in general, the order of presentation).

Improvements require “escaping” these local optima in favor of a global solution that accords the best overall outcome to all the animals and a minimization of the global error. And pre-training seems to do that. It likely moves each discriminative category closer to the global possibilities because those global possibilities are initially encoded by the pre-training phase.

This has the added benefit of regularizing or smoothing out the noise that is inherent in any real data set. Indeed, the two approaches appear to be closely allied in their impact on the overall machine learning process.

New Behaviorism and New Cognitivism

lstm_memorycellDeep Learning now dominates discussions of intelligent systems in Silicon Valley. Jeff Dean’s discussion of its role in the Alphabet product lines and initiatives shows the dominance of the methodology. Pushing the limits of what Artificial Neural Networks have been able to do has been driven by certain algorithmic enhancements and the ability to process weight training algorithms at much higher speeds and over much larger data sets. Google even developed specialized hardware to assist.

Broadly, though, we see mostly pattern recognition problems like image classification and automatic speech recognition being impacted by these advances. Natural language parsing has also recently had some improvements from Fernando Pereira’s team. The incremental improvements using these methods should not be minimized but, at the same time, the methods don’t emulate key aspects of what we observe in human cognition. For instance, the networks train incrementally and lack the kinds of rapid transitions that we observe in human learning and thinking.

In a strong sense, the models that Deep Learning uses can be considered Behaviorist in that they rely almost exclusively on feature presentation with a reward signal. The internal details of how modularity or specialization arise within the network layers are interesting but secondary to the broad use of back-propagation or Gibb’s sampling combined with autoencoding. This is a critique that goes back to the early days of connectionism, of course, and why it was somewhat sidelined after an initial heyday in the late eighties. Then came statistical NLP, then came hybrid methods, then a resurgence of corpus methods, all the while with image processing getting more and more into the hand-crafted modular space.

But we can see some interesting developments that start to stir more Cognitivism into this stew. Recurrent Neural Networks provided interesting temporal behavior that might be lacking in some feedforward NNs, and Long-Short-Term Memory (LSTM) NNs help to overcome some specific limitations of  recurrent NNs like the disconnection between temporally-distant signals and the reward patterns.

Still, the modularity and rapid learning transitions elude us. While these methods are enhancing the ability to learn the contexts around specific events (and even the unique variability of contexts), that learning still requires many exposures to get right. We might consider our language or vision modules to be learned over evolutionary history and so not expect learning within a lifetime from scratch to result in similarly structured modules, but the differences remain not merely quantitative but significantly qualitative. A New Cognitivism requires more work to rise from this New Behaviorism.

Evolving Visions of Chaotic Futures

FlutterbysMost artificial intelligence researchers think unlikely the notion that a robot apocalypse or some kind of technological singularity is coming anytime soon. I’ve said as much, too. Guessing about the likelihood of distant futures is fraught with uncertainty; current trends are almost impossible to extrapolate.

But if we must, what are the best ways for guessing about the future? In the late 1950s the Delphi method was developed. Get a group of experts on a given topic and have them answer questions anonymously. Then iteratively publish back the group results and ask for feedback and revisions. Similar methods have been developed for face-to-face group decision making, like Kevin O’Connor’s approach to generating ideas in The Map of Innovation: generate ideas and give participants votes equaling a third of the number of unique ideas. Keep iterating until there is a consensus. More broadly, such methods are called “nominal group techniques.”

Most recently, the notion of prediction markets has been applied to internal and external decision making. In prediction markets,  a similar voting strategy is used but based on either fake or real money, forcing participants towards a risk-averse allocation of assets.

Interestingly, we know that optimal inference based on past experience can be codified using algorithmic information theory, but the fundamental problem with any kind of probabilistic argument is that much change that we observe in society is non-linear with respect to its underlying drivers and that the signals needed are imperfect. As the mildly misanthropic Nassim Taleb pointed out in The Black Swan, the only place where prediction takes on smooth statistical regularity is in Las Vegas, which is why one shouldn’t bother to gamble. Taleb’s approach is to look instead at minimizing the impact of shocks (or hedging them in financial markets).

But maybe we can learn something from philosophical circles. For instance, Evolutionary Epistemology (EE), as formulated by Donald Campbell, Sir Karl Popper, and others, posits that central to knowledge formation is blind variation and selective retention. Combined with optimal induction, this leads to random processes being injected into any kind of predictive optimization. We do this in evolutionary algorithms like Genetic Algorithms, Evolutionary Programming, Genetic Programming, and Evolutionary Strategies, as well as in related approaches like Simulated Annealing. But EE also suggests that there are several levels of learning by variation/retention, from the phylogenetic learning of species through to the mental processes of higher organisms. We speculate and trial-and-error continuously, repeating loops of what-ifs in our minds in an effort to optimize our responses in the future. It’s confounding as hell but we do remarkable things that machines can’t yet do like folding towels or learning to bake bread.

This noosgeny-recapitulates-ontogeny-recapitulates-phylogeny (just made that up) can be exploited in a variety of ways for abductive inference about the future. We can, for instance, use evolutionary optimization with a penalty for complexity that simulates the informational trade-off of AIT-style inductive optimality. Further, the noosgeny component (by which I mean the internalized mental trial-and-error) can reduce phylogenetic waste in simulations by providing speculative modeling that retains the “parental” position on the fitness landscape before committing to a next generation of potential solutions, allowing for further probing of complex adaptive landscapes.

The Linguistics of Hate

keep-calm-and-hate-corpus-linguisticsRight-wing authoritarianism (RWA) and Social dominance orientation (SDO) are measures of personality traits and tendencies. To measure them, you ask people to rate statements like:

Superior groups should dominate inferior groups

The withdrawal from tradition will turn out to be a fatal fault one day

People rate their opinions on these questions using a 1 to 5 scale from Definitely Disagree to Strongly Agree. These scales have their detractors but they also demonstrate some useful and stable reliability across cultures.

Note that while both of these measures tend to be higher in American self-described “conservatives,” they also can be higher for leftist authoritarians and they may even pop up for subsets of attitudes among Western social liberals about certain topics like religion. Haters abound.

I used the R packages twitterR, textminer, wordcloud, SnowballC, and a few others and grabbed a few thousand tweets that contained the #DonaldJTrump hashtag. A quick scan of them showed the standard properties of tweets like repetition through retweeting, heavy use of hashtags, and, of course, the use of the #DonaldJTrump as part of anti-Trump sentiments (something about a cocaine-use video). But, filtering them down, there were definite standouts that seemed to support a RWA/SDO orientation. Here are some examples:

The last great leader of the White Race was #trump #trump2016 #donaldjtrump #DonaldTrump2016 #donaldtrump”

Just a wuss who cant handle the defeat so he cries to GOP for brokered Convention. # Trump #DonaldJTrump

I am a PROUD Supporter of #DonaldJTrump for the Highest Office in the land. If you don’t like it, LEAVE!

#trump army it’s time, we stand up for family, they threaten trumps family they threaten us, lock and load, push the vote…

Not surprising, but the density of them shows a real aggressiveness that somewhat shocked me. So let’s assume that Republicans make up around 29% of the US population, and that Trump is getting around 40% of their votes in the primary season, then we have an angry RWA/SDO-focused subpopulation of around 12% of the US population.

That seems to fit with results from an online survey of RWA, reported here. An interesting open question is whether there is a spectrum of personality types that is genetically predisposed, or whether childhood exposures to ideas and modes of childrearing are more likely the cause of these patterns (and their cross-cultural applicability).

Here are some interesting additional resources:

Bilewicz, Michal, et al. “When Authoritarians Confront Prejudice. Differential Effects of SDO and RWA on Support for Hate‐Speech Prohibition.” Political Psychology (2015).

Sylwester K, Purver M (2015) Twitter Language Use Reflects Psychological Differences between Democrats and Republicans. PLoS ONE 10(9): e0137422. doi:10.1371/journal.pone.0137422

The latter has a particularly good overview of RWA/SDO, other measures like openness, etc., and Twitter as an analytics tool.

Finally, below is some R code for Twitter analytics that I am developing. It is derivative of sample code like here and here, but reorients the function structure and adds deletion of Twitter hashtags to focus on the supporting language. There are some other enhancements like codeset normalization. All uses and reuses are welcome. I am starting to play with building classifiers and using Singular Value Decomposition to pull apart various dominating factors and relationships in the term structure. Ultimately, however, human intervention is needed to identify pro vs. anti tweets, as well as phrasal patterns that are more indicative of RWA/SDO than bags-of-words can indicate.

Also, here are wordclouds generated for #hillaryclinton and #DonaldJTrump, respectively. The Trump wordcloud was distorted by some kind of repetitive robotweeting that dominated the tweets.





 #Grab the tweets
 djtTweets <- searchTwitter(searchTerm, num)

 #Use a handy helper function to put the tweets into a dataframe 

 RemoveDots <- function(tweet) {
 gsub("[\\.\\,\\;]+", " ", tweet)

 RemoveLinks <- function(tweet) {
 gsub("http:[^ $]+", "", tweet)
 gsub("https:[^ $]+", "", tweet)

 RemoveAtPeople <- function(tweet) {
 gsub("@\\w+", "", tweet) 

 RemoveHashtags <- function(tweet) {
 gsub("#\\w+", "", tweet) 

 FixCharacters <- function(tweet){

 CleanTweets <- function(tweet){
 s1 <- RemoveLinks(tweet)
 s2 <- RemoveAtPeople(s1)
 s3 <- RemoveDots(s2) 
 s4 <- RemoveHashtags(s3)
 s5 <- FixCharacters(s4)

 tweets <- as.vector(sapply(tw.df$text, CleanTweets))
 if (verbose) print(tweets)

 generateCorpus= function(df,pstopwords){
 tw.corpus= Corpus(VectorSource(df))
 tw.corpus = tm_map(tw.corpus, content_transformer(removePunctuation))
 tw.corpus = tm_map(tw.corpus, content_transformer(tolower))
 tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
 tw.corpus = tm_map(tw.corpus, removeWords, pstopwords)

 corpus = generateCorpus(tweets)

 doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
 dm = as.matrix(doc.m)
 # calculate the frequency of words
 v = sort(rowSums(dm), decreasing=TRUE)

 d = data.frame(word=names(v), freq=v)
 #Generate the wordcloud
 wc=wordcloud(d$word, d$freq, scale=c(4,0.3), min.freq=min.freq, colors = brewer.pal(8, "Paired"))

djttweets = tweets.grabber("#DonaldJTrump", 2000, verbose=TRUE)
djtcorpus = corpus.stats(djttweets)
wordcloud.generate(djtcorpus, 3)

The Retiring Mind, Part III: Autonomy

Retiring Mind IIIRobert Gordon’s book on the end of industrial revolutions recently came out. I’ve been arguing for a while that the coming robot apocalypse might be Industrial Revolution IV. But the Dismal Science continues to point out uncomfortable facts in opposition to my suggestion.

So I had to test the beginning of the end (or the beginning of the beginning?) when my Tesla P90D with autosteer, summon mode, automatic parking, and ludicrous mode arrived to take the place of my three-year-old P85:

The Goldilocks Complexity Zone

FractalSince my time in the early 90s at Santa Fe Institute, I’ve been fascinated by the informational physics of complex systems. What are the requirements of an abstract system that is capable of complex behavior? How do our intuitions about complex behavior or form match up with mathematical approaches to describing complexity? For instance, we might consider a snowflake complex, but it is also regular in it’s structure, driven by an interaction between crystal growth and the surrounding air. The classic examples of coastlines and fractal self-symmetry also seem complex but are not capable of complex behavior.

So what is a good way of thinking about complexity? There is actually a good range of ideas about how to characterize complexity. Seth Lloyd rounds up many of them, here. The intuition that drives many of them is that complexity seems to be associated with distributions of relationships and objects that are somehow juxtapositioned between a single state and a uniformly random set of states. Complex things, be they living organisms or computers running algorithms, should exist in a Goldilocks zone when each part is examined and those parts are somehow summed up to a single measure.

We can easily construct a complexity measure that captures some of these intuitions. Let’s look at three strings of characters:

x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

y = menlqphsfyjubaoitwzrvcgxdkbwohqyxplerz

z = the fox met the hare and the fox saw the hare

Now we would likely all agree that y and z are more complex than x, and I suspect most would agree that y looks like gibberish compared with z. Of course, y could be a sequence of weirdly coded measurements or something, or encrypted such that the message appears random. Let’s ignore those possibilities for our initial attempt at defining a complexity measure. We can see right away that an approach using basic information theory doesn’t help much. Algorithmic informational complexity will be highest for y, as will entropy:

for each sequence composed out of an alphabet with counts, s. So we get: H(x) = 0, H(y) = 3.199809, and H(z) = 2.3281. Here’s some sample R code using the “entropy” package if you want to calculate yourself:

> z = "the fox met the hare and the fox saw the hare"
> zt = table(strsplit(z, '')[[1]])
> entropy(zt, method="ML")

Note that the alphabet of each string is slightly different, but the missing characters between them don’t matter since their probabilities are 0.

We can just arbitrarily scale entropy by the maximum entropy possible for the same length string like this:

This is somewhat like channel efficiency in communications theory, I think. And then just turn this into a parabolically-scaled measure that centers at 0.5:

where is an arbitrary non-zero scaling parameter.

But this calculation is only considering the individual character frequencies, not the composition of the characters into groupings. So we can consider pairs of characters in this same calculation, or triples, etc. And also, just looking at these n-gram sequences doesn’t capture potentially longer range repetitious structures. So we can gradually ladle on grammars as the counting mechanism. Now, if our measure of complexity is really going to capture what we intuitively consider to be complex, all of these different levels of connections within the string or other organized piece of information must be present.

This general program is present in every one of Seth Lloyd’s complexity metrics in various ways and even comes into play in discussions of consciousness, though many use mutual information rather than entropy per se. Here’s Max Tegmark using a variation on Giulio Tinoni’s Phi concept from Integrated Information Theory to demonstrate that integration is a key component of consciousness and how that might be calculated for general physical systems.

Entanglement and Information

shannons-formula-smallResearch can flow into interesting little eddies that cohere into larger circulations that become transformative phase shifts. That happened to me this morning between a morning drive in the Northern California hills and departing for lunch at one of our favorite restaurants in Danville.

The topic I’ve been working on since my retirement is whether there are preferential representations for optimal automated inference methods. We have this grab-bag of machine learning techniques that use differing data structures but that all implement some variation on fitting functions to data exemplars; at the most general they all look like some kind of gradient descent on an error surface. Getting the right mix of parameters, nodes, etc. falls to some kind of statistical regularization or bottlenecking for the algorithms. Or maybe you perform a grid search in the hyperparameter space, narrowing down the right mix. Or you can throw up your hands and try to evolve your way to a solution, suspecting that there may be local optima that are distracting the algorithms from global success.

Yet, algorithmic information theory (AIT) gives us, via Solomonoff, a framework for balancing parameterization of an inference algorithm against the error rate on the training set. But, first, it’s all uncomputable and, second, the AIT framework just uses strings of binary as the coded Turing machines, so I would have to flip 2^N bits and test each representation to get anywhere with the theory. Yet, I and many others have had incremental success at using variations on this framework, whether via Minimum Description Length (MDL) principles, it’s first cousin Minimum Message Length (MML), and other statistical regularization approaches that are somewhat proxies for these techniques. But we almost always choose a model (ANNs, compression lexicons, etc.) and then optimize the parameters around that framework. Can we do better? Is there a preferential model for time series versus static data? How about for discrete versus continuous?

So while researching model selection in this framework, I come upon a mention of Shannon’s information theory and its application to quantum decoherence. Of course I had to investigate. And here is the most interesting thing I’ve seen in months from the always interesting Max Tegmark at MIT:

Particles entangle and then quantum decoherence causes them to shed entropy into one another during interaction. But, most interesting, is the quantum Bayes’ theory section around 00:35:00 where Shannon entropy as a classical measure of improbability gets applied to the quantum indeterminacy through this decoherence process.

I’m pretty sure it sheds no particular light on the problem of model selection but when cosmology and machine learning issues converge it gives me mild shivers of joy.

A Soliloquy for Volcanoes and Nearest Neighbors

Tongariro National Park: Emerald Lake
Tongariro National Park: Emerald Lake

A German kid caught me talking to myself yesterday. It was my fault, really. I was trying to break a hypnotic trance-like repetition of exactly what I was going to say to the tramper’s hut warden about two hours away. OK, more specifically, I had left the Waihohonu camp site in Tongariro National Park at 7:30AM and was planning to walk out that day. To put this into perspective, it’s 28.8 km (17.9 miles) with elevation changes of around 900m, including a ridiculous final assault above red crater at something like 60 degrees along a stinking volcanic ridge line. And, to make things extra lovely, there was hail, then snow, then torrential downpours punctuated by hail again—a lovely tramp in the New Zealand summer—all in a full pack.

But anyway, enough bragging about my questionable judgement. I was driven by thoughts of a hot shower and the duck l’orange at Chateau Tongariro while my hands numbed to unfeeling arresting myself with trekking poles down through muddy canyons. I was talking to myself. I was trying to stop repeating to myself why I didn’t want my campsite for the night that I had reserved. This is the opposite of glorious runner’s high. This is when all the extra blood from one’s brain is obsessed with either making leg muscles go or watching how the feet will fall. I also had the hood of my rain fly up over my little Marmot ball cap. I was in full regalia, too, with the shifting rub of my Gortex rain pants a constant presence throughout the day.  I didn’t notice him easing up on me as I carried on about one-shot learning as some kind of trance-breaking ritual.

We exchanged pleasantries and he meandered on. With his tiny little day pack it was clear he had just come up from the car park at Mangatepopo for a little jaunt. Eurowimp. I caught up with him later slathering some kind of meat product on white bread trailside and pushed by, waiting on my own lunch of jerky, chili-tuna, crackers, and glorious spring water, gulp after gulp, an hour onward. He didn’t bring up the glossolalic soliloquy incident.

My mantra was simple: artificial neural networks, including deep learning approaches, require massive learning cycles and huge numbers of exemplars to learn. In a classic test, scores of handwritten digit images (0 to 9) are categorized as to which number they are. Deep learning systems have gotten to 99% accuracy on that problem, actually besting average human performance. Yet they require a huge training corpus to pull this off, combined with many CPU hours to optimize the models on that corpus. We humans can do much better than that with our neural systems.

So we get this recently lauded effort, One-Shot Learning of Visual Concepts, that uses an extremely complicated Bayesian mixture modeling approach that combines stroke exemplars together for trying to classify foreign and never-before-seen characters (like Bengali or Ethiopic) after only one exposure to the stimulus. In other words, if I show you some weird character with some curves and arcs and a vertical bar in it, you can find similar ones in a test set quite handily, but machines really can’t. A deep learning model could be trained on every possible example known in a long, laborious process, but when exposed to a new script like Amharic or a Cherokee syllabary, the generalizations break down. A simple comparison approach is to use a nearest neighbor match or vote. That is, simply create vectors of the image pixels starting at the top left and compare the distance between the new image vector and the example using an inner vector product. Similar things look the same and have similar pixel patterns, right? Well, except they are rotated. They are shifted. They are enlarged and shrunken.

And then it hit me that the crazy-complex stroke model could be simplified quite radically by simply building a similar collection of stroke primitives as splines and then looking at the K nearest neighbors in the stroke space. So a T is two strokes drawn from the primitives collection with a central junction and the horizontal laying atop the vertical. This builds on the stroke-based intuition of the paper’s authors (basically, all written scripts have strokes as a central feature and we as writers and readers understand the line-ness of them from experience with our own script).

I may have to try this out. I should note, also in critique of this antithesis of runner’s high (tramping doldrums?), that I was also deeply concerned that there were so many damn contending voices and thoughts racing around my head in the face of such incredible scenery. Why did I feel the need to distract my mind from it’s obsessions over something so humanly trivial? At least, I suppose, the distraction was interesting enough that it was worth the effort.

The Retiring Mind, Part 1: Clouds

goghcloudsI’m setting my LinkedIn and Facebook status to retired on 11/30 (a month later than planned, alas). Retired isn’t completely accurate since I will be in the earliest stage of a new startup in cognitive computing, but I want to bask ever-so-briefly in the sense that I am retired, disconnected from the circuits of organizations, and able to do absolutely nothing from day-to-day if I so desire.

(I’ve spent some serious recent cycles trying to combine Samuel Barber’s “Adagio for Strings” as an intro to the Grateful Dead’s “Terrapin Station”…on my Line6 Variax. Modulate B-flat to C, then D, then E. If there is anything more engaging for a retiring mind, I can’t think of it.)

I recently pulled the original server off a shelf in my garage because I had a random Kindle Digital Publisher account that I couldn’t find the credentials for and, in a new millennium catch-22, I couldn’t ask for a password reset because it had to go to that old email address. I swapped hard drives between a few Linux pizza-box servers and messed around with old BIOS and boot settings, and was finally able to get the full mail archive off the drive. In the process I had to rediscover all the arcane bits of Dovecot and mail.rc and SMTP configurations, and a host of other complexities. After not finding what I needed there, alas, I compressed the mail collection and put it on Dropbox.

I also retired a Mac Mini, shipping it off to a buy-back place for a few hundred bucks in Amazon credit. It had been a Subversion server that followed-up for, holding more than ten years of intellectual property in stasis. And I mean everything: business records, PowerPoints, source code, release packages, artwork, manuscripts, music. The archives were recorded to a USB drive and then tarred and dropped into Dropbox. A few of the more personal archive collections were transformed into Git repositories and stored on a OneDrive account.

And the new startup will exclusively use Microsoft Office 365 for email, calendaring, and productivity (yes, I tried Google Docs). Yammer will help with internal knowledge management. Atlassian’s Confluence, JIRA, and Bitbucket will support code development. Lync and Skype are collaboration tools. Products will launch in Amazon EC2 instances. Financials, HR, and talent acquisition will go to WorkDay. Then we have LegalZoom for legalities, for trademarks and patents, GoDaddy for domain registration, iPage for WordPress hosting, and so on. And the absolutely critical 1Password for keeping all these credentials straight across dozens of web properties and online systems (I have 178 separate logins stored in 1Password!), with the 1Password archive encrypted in Dropbox and accessible from phones or laptops as needed.

What an incredible change. In just a few years we have erased or reduced to a trickle the frictional costs of doing a modern software business. Even travel has become easier with TripIt Pro. I just forward any itinerary I get from any airline or online booking service and it gets incorporated into a master itinerary. I check in for flights online and the boarding passes appear on my Apple Watch for scanning. I’m taking off for two weeks of backpacking and trail running in New Zealand as some kind of psychological commitment to the concept of retirement so travel optimization is weighing on me right now.

Cord-cutting for cable and landline (except broadband) is coming soon. Television is bad enough that surfing it should not be an option. Also, one of the interesting consequences of cloud everything (including installed software assets in the Apple App Store, Steam, music, movies, etc.) is that the compute platform can be swapped as needed. I keep disposing of compute platforms and I’m now down to just an iPhone 6 and 2015 Macbook with a curved 34” LG 4K display. The Macbook might get swapped for a next-gen Air within a year, or something else (I’ve tried every gen of iPads and also forced myself to live with a Microsoft Surface Pro 3, but ended up selling each because of non-use). If and when I swap platforms, it just takes a day or so to get everything synced up and working again.

The flexibility of the operations back-end of this new startup world demonstrates an odd fact about Silicon Valley: we are getting close to being able to turn ideas directly into tangible products with little or no capital investment. Our OPEXs become predictable and manageable ($12/month per user, for instance). We have no CAPEX. With Obamacare even the mind-numbing opaqueness of the health insurance market breaks open for independent contractors and contributors.

It’s feeling very warm and comfortable in the clouds.