N1   Angryman Challenge Problem

On the 8th of August 1900, David Hilbert presented a set of 10 problems at a conference at the Sorbonne in Paris. While anyone can throw out ten unsolved problems, Hilbert’s problems influenced much of 20th century mathematics, leading to entire new fields of mathematics — group theory and Godel’s theorem to name two. The interesting aspect is that one man (Hilbert) was so in tune with mathematics that his problem set drove much of the mathematical work of the century. [There were 23 problems but only ten were presented at Sorbonne.]

Later in 2000, the Clay Mathematics Institute, initiated the Millennium Prize, which unlike Hilbert’s problems came with a $1MM prize for the solution. Interestingly, the Millennium Prize problems still contain the Riemann Hypothesis, one of Hilbert’s original questions.

As a member of the Angry Man technorati, I am continually impressed by the readership of our humble blog, as well as the stream of email from my fellow Angry Men; and while I would not characterize myself as being in the same class as David Hilbert, I have been involved with technology for much of my life. As such I feel inclined to throw out a few problems of my own. So all you slashdottirs and sons, I present the AMB Challenge: initial problem.

Search engines incorporate various algorithms to index and identify web content. Much of this indexing is handled through ‘robots’ and ‘crawlers’ which follow links. Google, Ask Jeeves, and Dogpile are really pretty good once you train yourself to ask the right questions (and ignore the first three sponsored answers). But do you ever feel something is missing?

The first challenge problem is to design a system which can be integrated into search engines to identify and return links to images. A browser plug-in would allow the user to upload for immediate analysis an image or frame of a movie to be used as a search key. The search engine would return content which contained that specific image whether in a static image file (.TIFF, .JPG, .PNG, .GIF, etc.), or incorporated into a movie (.MPG, .MOV, etc.). This would presumably be accomplished by indexing the web for images. As crawlers identify known image formats, an analysis would be performed and a compact representation of the image content would be stored with the link. The submitted image would similarly be reduced to a representation and used as the search key, returning the links, and perhaps thumbnails, associated with the close matches. Users could specify match thresholds in preferences.

Consider a few applications. One might want to load an image of your girlfriend and return all links to any on-line content with her visage, such as MySpace, FaceBook, group photos posted by organizations, etc. Or as a fellow Angryman quipped “It would revolutionize the porn industry. You could search for your particular preference: a blond and two frogs.”

More seriously, trademark and service mark protections are dependent for their validity upon aggressive defense by the owner. McDonalds Corporation hires law firms to search the Internet to identify misuse of the “golden arches” — much like the misuse parodied by McDowell in Eddie Murphy’s “Coming to America” where McDowell’s restaurant is adorned with a set of somewhat similar arches.

Consider all of that imagery collected in the streets of London. If the process could be used on line to index web content, it could certainly be used to index stored video content. The national security implications are staggering.

Companies already invested in video web content (YouTube?) would have a vested interest in developing this technology. Competing companies (in Redmond, say), would have a powerful incentive to come up with a technology that would prevent certain search companies from attaining a 100% share of the search engine (and ad revenue) market.

I mention these few applications in passing only in that whoever develops this technology (solves the challenge problem) will probably not have to worry about rising gasoline prices. While we at 12 Angry Men would not be able to match the Millennium Prize amount, it would not be unreasonable (hint, hint) for Microsoft or Google to cough up, say, $10MM for the winner. The NSA and intellectual property lawyers are said to have money also. We will throw in a free beer at the Man Lunch providing the winner is local.

The current state of the art seems to be based on tags which are assigned to images by the provider or poster. That the automated search of images is yet an immature field is evidenced by Google’s attempt to entice users into participating in tagging. Polar Rose is another attempt at making a user friendly browser image tagger. To be truly useful, the image must be analyzed on basis of content, not associated metadata. As a start to the problem, the following thoughts are forwarded:

  • The compact representation of the image must be generated in small polynomial time
  • Compact is very small compared to size of original image
  • Representational form may be equivalent to solving eigenvalue problems of high dimension
  • Formats need to be expanded to a common form for analysis
  • Search comparison will likely be multidimensional

I looked at this from the point of view that many images were self-similar and subject to a fractal compression technique similar to Michael Barnsley’s Iterated Function Systems (IFS). In a particular iteration experiment, we were able to create a particularly accurate coastline of Australia using five line segments and a set of iteration coefficients — extremely compact compared to the point set describing the continent. Unfortunately, the rendering could be done in linear time but the cost of deriving the coefficients, the heart of the problem under consideration here, was high. Wavelet approximation also has some potential. Added to this is an observation that it might be more efficient to classify images first before trying to generate a compact representation. Kris Woodbeck is reported to have a process similar to the way humans process images but a quick read suggests it’s more in the line of a classifier rather than reducing the image representation to a searchable context.

Any process is used both at the server side when requested by the browser at submission of the search key image, and in the crawler image indexing process. So far the human eye-brain is the only process that comes close to doing this.

Good luck.