Fish eating fish eating fish

October 28, 2012 Mark DavisLeave a comment

Decompressing in NorCal following a vibrant Hadoop World. More press mentions:

· Big Data, Big News: 10 Things To See At Hadoop World, CRN, October 23, 2012 – (Circulation 53,397)

· Quest Software Announces Hadoop-Centric Software Analytics, CloudNewsDaily, October 23, 2012-coverage of Hadoop product announcements.

· Quest Launches New Analytics Software for Hadoop, SiliconANGLE, October 23, 2012- coverage of Hadoop Product.

· Continuing its M&A software push, Dell moves into ‘big data’ analytics via Kitenga buy, 451 Research

· Cisco Updates Schedule to Automate Hadoop Big Data Analysis Systems, Eweek, October 24, 2012- mention of Kitenga product announcement at Hadoop. (Circulation 196,157)

· Quest Launches New Analytics Software for Hadoop, DABBC, October 24, 2012

And what about fish? Dell == Big Fish, Quest == Medium Fish, Kitenga == Happy Minnow.… Read the rest

Dell Acquires Kitenga

October 24, 2012October 24, 2012 Mark DavisLeave a comment

Dell Inc. : Quest Software Expands Its Big Data Solution with New Hadoop-Centric Software Capabilities for Business Analytics

10/23/2012| 08:05am US/Eastern

Complete solution includes application development, data replication, and data analysis

Hadoop World 2012-Quest Software, Inc. (now part of Dell) announced three significant product releases today aimed at helping customers more quickly adopt Hadoop and exploit their Big Data. When used together, the three products offer a complete solution that addresses the greatest challenge with Hadoop: the shortage of technical and analytical skills needed to gain meaningful business insight from massive volumes of captured data. Quest builds on its long history in data and database management to open the world of Big Data to more than just the data scientist.

News Facts:

Kitenga Analytics: Based on the recent acquisition of Kitenga, Quest Software now enables customers to analyze structured, semi-structured and unstructured data stored in Hadoop. Available immediately, Kitenga Analytics delivers sophisticated capabilities, including text search, machine learning, and advanced visualizations, all from an easy-to-use interface that does not require understanding of complex programming or the Hadoop stack itself. With Kitenga Analytics and the Quest Toad^®Business Intelligence Suite, an organization has a complete self-service analysis environment that empowers business and systems analysts across a variety of backgrounds and job roles.

http://www.4-traders.com/DELL-INC-4867/news/Dell-Inc-Quest-Software-Expands-Its-Big-Data-Solution-with-New-Hadoop-Centric-Software-Capabiliti-15415359/… Read the rest

Hadoop World NYC

October 13, 2012October 13, 2012 Mark DavisLeave a comment

Catch me at Hadoop World 2012 in NYC, October 23-25th. I’ll be wearing a new shirt and logo.… Read the rest

An Exit to a New Beginning

September 26, 2012 Mark DavisLeave a comment

I am thrilled to note that my business partner and I sold our Big Data analytics startup to a large corporation yesterday. I am currently unemployed but start anew doing the same work on Monday.

Thrilled is almost too tame a word. Ecstatic does better describing the mood around here and the excitement we have over having triumphed in Sili Valley. There are many war stories that we’ve been swapping over the last 24 hours, including how we nearly shut down/rebooted at the start of 2012. But now it is over and we have just a bit of cleanup work left to dissolve the existing business structures and a short vacation to attend to.… Read the rest

Semantic Zooming Demo

June 16, 2012 Mark DavisLeave a comment

Semantic Zooming over Hadoop Distributed File Systems (with lenses) from Hadoop Summit 2012:

… Read the rest

Semantic Zooming

June 11, 2012 Mark DavisLeave a comment

I’ve been pushing hard for a demo at Hadoop Summit this week, waking unexpectedly at 5 AM this morning with spherical trigonometry percolating through my head. The topic is “semantic zooming” and it is not a complicated concept to understand because we have a common example that many of us use daily: Google Maps. All the modern, online mapping systems do semantic zooming to a degree when they change the types of information that are displayed on the map depending on the zoom level. Thus, the “semantics” or “meaning” of the displayed information changes with zooming, revealing states, then rivers, then major roads, then minor roads, and then all the way down to local businesses. The goal of semantic zooming is to manage information overload by managing semantics.

In my case, I’m using a semantic zooming interface to apply different types of information visualizations to data resources in a distributed file system (a file system that spans many disk drives in many computers) related to the “big data” technology, Hadoop. A distributed file system can have many data types (numerical data, text, PDFs, log files from web servers, scientific data) and the only way to interact with the data is through a command-line or through fairly simple web-based user interfaces that act like crude file system browsers. Making use of the data in the system, analyzing it, requires running analysis processes on it, then pulling the data out and importing it into other technologies like Excel or business intelligence systems to bind charting and visualization tools to it. With semantic zooming operating directly on the data, however, the structure of the data can be probed directly and the required background processes launch automatically to create new aggregate views of the data.… Read the rest

Randomness and Meaning

June 4, 2012June 4, 2012 Mark Davis1 Comment

The impossibility of the Chinese Room has implications across the board for understanding what meaning means. Mark Walker’s paper “On the Intertranslatability of all Natural Languages” describes how the translation of words and phrases may be achieved:

Through a simple correspondence scheme (word for word)
Through “syntactic” expansion of the languages to accommodate concepts that have no obvious equivalence (“optometrist” => “doctor for eye problems”, etc.)
Through incorporation of foreign words and phrases as “loan words”
Through “semantic” expansion where the foreign word is defined through its coherence within a larger knowledge network.

An example for (4) is the word “lepton” where many languages do not have a corresponding concept and, in fact, the concept is dependent on a bulwark of advanced concepts from particle physics. There may be no way to create a superposition of the meanings of other words using (2) to adequately handle “lepton.”

These problems present again for trying to understand how children acquire meaning in learning a language. As Walker points out, language learning for a second language must involve the same kinds of steps as learning translations, so any simple correspondence theory has to be supplemented.

So how do we make adequate judgments about meanings and so rapidly learn words, often initially with a course granularity but later with increasingly sharp levels of focus? What procedure is required for expanding correspondence theories to operate in larger networks? Methods like Latent Semantic Analysis and Random Indexing show how this can be achieved in ways that are illuminating about human cognition. In each case, the methods provide insights into how relatively simple transformations of terms and their occurrence contexts can be viewed as providing a form of “triangulation” about the meaning of words.… Read the rest

The Unreasonable Success of Reason

May 21, 2012May 21, 2012 Mark DavisLeave a comment

Math and natural philosophy were discovered several times in human history: Classical Greece, Medieval Islam, Renaissance Europe. Arguably, the latter two were strongly influenced by the former, but even so they built additional explanatory frameworks. Moreover, the explosion that arose from Europe became the Enlightenment and the modern edifice of science and technology

So, on the eve of an eclipse that sufficiently darkened the skies of Northern California, it is worth noting the unreasonable success of reason. The gods are not angry. The spirits are not threatening us over a failure to properly propitiate their symbolic requirements. Instead, the mathematics worked predictively and perfectly to explain a wholly natural phenomenon.

But why should the mathematics work so exceptionally well? It could be otherwise, as Eugene Wigner’s marvelous 1960 paper, The Unreasonable Effectiveness of Mathematics in the Natural Sciences, points out:

All the laws of nature are conditional statements which permit a prediction of some future events on the basis of the knowledge of the present, except that some aspects of the present state of the world, in practice the overwhelming majority of the determinants of the present state of the world, are irrelevant from the point of view of the prediction.

…

A possible explanation of the physicist’s use of mathematics to formulate his laws of nature is that he is a somewhat irresponsible person. As a result, when he finds a connection between two quantities which resembles a connection well-known from mathematics, he will jump at the conclusion that the connection is that discussed in mathematics simply because he does not know of any other similar connection.

Galileo’s rocks fall at the same rates but only provided that they are not unduly flat and light.… Read the rest

Upcoming Conference Presentations

April 17, 2012 Mark DavisLeave a comment

See me talk at:

Lucene Revolution 2012, Boston, May 9th. Topic: Big Data Search

Hadoop Summit 2012, San Jose, June 13-14. Topic: Semantic Zooming Interfaces for Big Data

And don’t be late…… Read the rest

Experimental Psychohistory

February 1, 2012February 12, 2012 Mark DavisLeave a comment

Kalev Leetaru at UIUC highlights the use of sentiment analysis to retrospectively predict the Arab Spring using Big Data in this paper. Dr. Leetaru took English transcriptions of Egyptian press sources and looked at aggregate measures of positive and negative sentiment terminology. Sentiment terminology is fairly simple in this case, consisting of positive and negative adjectives primarily, but could be more discriminating by checking for negative modifiers (“not happy,” “less than happy,” etc.). Leetaru points out some of the other follies that can arise from semi-intelligent broad measures like this one applied too liberally:

It is important to note that computer–based tone scores capture only the overall language used in a news article, which is a combination of both factual events and their framing by the reporter. A classic example of this is a college football game: the hometown papers of both teams will report the same facts about the game, but the winning team’s paper will likely cast the game as a positive outcome, while the losing team’s paper will have a more negative take on the game, yielding insight into their respective views towards it.

This is an old issue in computational linguistics. In the “pragmatics” of automatic machine translation, for example, the classic example is how do you translate fighters in a rebellion. They could be anything from “terrorists” to “freedom fighters,” depending on the perspective of the translator and the original writer.

In Leetaru’s work, the end result was an unusually high churn of negative-going sentiment as the events of the Egyptian revolution unfolded.

But is it repeatable or generalizable? I’m skeptical. The rise of social media, enhanced government suppression of the media, spamming, disinformation, rapid technological change, distributed availability of technology, and the evolving government understanding of social dynamics can all significantly smear-out the priors associated with the positive signal relative to the indeterminacy of the messaging.… Read the rest