The speculation that Comcast is going to impose download quotas on their Internet services has got me seeing red and has blood shooting out of my nose. Granted, a 250 GByte limit per month seems like a reasonable limit, and I really really do understand their concerns with peer-to-peer, but there is more here that should be addressed.

Now, as it happens, I have access to certain information about the network utilization at a large State University. The campus, having recently completed an expensive DWDM fiber ring to a major city, now has access to commodity rates for Internet connectivity and in aggregate has around 3 G/s available. The campus housing system, which consists of dorms and greek houses, is attached to the Internet on a rate limited basis. When the limit was 100 M/s the housing network was using, strangely enough, 100 M/s. When the rate limit was increased to 200 M/s, the utilization increased to 200 M/s in the time it took to press the final keystroke to the time it took to display the new statistics. Doubling the limit again resulted in similar results, from which I conclude that the housing network would consume exactly as much of the aggregate bandwith as was made available.

Now I find it hard to believe that any group of students browsing the web, IM-ing their fellow students, or emailing their friends and family could generate precisely this utilization statistic. Clearly this is a result of around 30,000 peer-to-peer programs running on their machines. So I, as I said, understand Comcast’s attempted strategy for maintaining some semblance of control over their bandwidth utilization.

What has me seeing red is the fact that I have no control over my utilization. When browsing the web, in the new ‘World according to Google’, my downlink is jammed up with Flash code, streaming video advertisments, and an enormous amount of crap I really don’t want to see. It is annoying enough to check the weather at Intellicast for the local doppler radar, and have to wait for all the connections to googleanalytics, doubleclick, and the ad servers to complete before executing the java script to allow me to mouse over the drop down menu to get to the listed radar link.

It used to be that a responsible company, when designing a site on the web, would load the important stuff first in the top third of the screen and run the appropriate scripts immediately so that you could click through fast to get where you need to be. None of this waiting around for Flash to load and paint pretty, extremely annoying, and totally worthless graphics all over the index page (just to show how clever and artistic their designers are.) Damn! If I wanted to see that crap I would go to the local art museum, which by the way has a great section on computer art and generated graphics. And if Flash wasn’t bad enough, video streams for microwindows of auto advertising where the designers set the buffering level at 100% before execution. Egad! The same university I spoke of recently redesigned their index page away from a relatively useful portal to a completely annoying top third flash graphic all in the name of “Branding” the university.

If I want to read something, like a news article, at most I need 300-500 bytes in the body section. Instead, I have to wait through a header which loads all the javascript functions, the Flash loads, the video buffers, the fancy scrolling marquees, and my 500 bytes. Frantically mousing over where the menus should be and getting squat — watching the bottom left where the google analytics activity flickers, and waiting not-so-patiently for the “Done”, that’s bad enough.

But then to hear that Comcast is going to limit me to some number of bytes down to my browser before they charge me overlimit fees or trim my rate — crimson pulses throbbing down my chin. I vote for an option that provides for in-line culling of crap. Go to a web site and any connection to google analytics or a google ad server gets nulled out — before it is counted against your quota. Or better yet: The Better Web and Sanity Seal of Approval. Approved sites, without the crap, gets exempted from the quotas.

Either way, slapping quotas on the users, under the assumption that they are peering, without some consideration for what drivel is forced down the link is not looking at the whole picture. If I am running limewire, then I deserve it. If I am researching patents, or trying to get the latest stock quote, or reading the weather map, and get limited, then I am looking for another provider.

N1   Angryman Challenge Problem

On the 8th of August 1900, David Hilbert presented a set of 10 problems at a conference at the Sorbonne in Paris. While anyone can throw out ten unsolved problems, Hilbert’s problems influenced much of 20th century mathematics, leading to entire new fields of mathematics — group theory and Godel’s theorem to name two. The interesting aspect is that one man (Hilbert) was so in tune with mathematics that his problem set drove much of the mathematical work of the century. [There were 23 problems but only ten were presented at Sorbonne.]

Later in 2000, the Clay Mathematics Institute, initiated the Millennium Prize, which unlike Hilbert’s problems came with a $1MM prize for the solution. Interestingly, the Millennium Prize problems still contain the Riemann Hypothesis, one of Hilbert’s original questions.

As a member of the Angry Man technorati, I am continually impressed by the readership of our humble blog, as well as the stream of email from my fellow Angry Men; and while I would not characterize myself as being in the same class as David Hilbert, I have been involved with technology for much of my life. As such I feel inclined to throw out a few problems of my own. So all you slashdottirs and sons, I present the AMB Challenge: initial problem.

Search engines incorporate various algorithms to index and identify web content. Much of this indexing is handled through ‘robots’ and ‘crawlers’ which follow links. Google, Ask Jeeves, and Dogpile are really pretty good once you train yourself to ask the right questions (and ignore the first three sponsored answers). But do you ever feel something is missing?

The first challenge problem is to design a system which can be integrated into search engines to identify and return links to images. A browser plug-in would allow the user to upload for immediate analysis an image or frame of a movie to be used as a search key. The search engine would return content which contained that specific image whether in a static image file (.TIFF, .JPG, .PNG, .GIF, etc.), or incorporated into a movie (.MPG, .MOV, etc.). This would presumably be accomplished by indexing the web for images. As crawlers identify known image formats, an analysis would be performed and a compact representation of the image content would be stored with the link. The submitted image would similarly be reduced to a representation and used as the search key, returning the links, and perhaps thumbnails, associated with the close matches. Users could specify match thresholds in preferences.

Consider a few applications. One might want to load an image of your girlfriend and return all links to any on-line content with her visage, such as MySpace, FaceBook, group photos posted by organizations, etc. Or as a fellow Angryman quipped “It would revolutionize the porn industry. You could search for your particular preference: a blond and two frogs.”

More seriously, trademark and service mark protections are dependent for their validity upon aggressive defense by the owner. McDonalds Corporation hires law firms to search the Internet to identify misuse of the “golden arches” — much like the misuse parodied by McDowell in Eddie Murphy’s “Coming to America” where McDowell’s restaurant is adorned with a set of somewhat similar arches.

Consider all of that imagery collected in the streets of London. If the process could be used on line to index web content, it could certainly be used to index stored video content. The national security implications are staggering.

Companies already invested in video web content (YouTube?) would have a vested interest in developing this technology. Competing companies (in Redmond, say), would have a powerful incentive to come up with a technology that would prevent certain search companies from attaining a 100% share of the search engine (and ad revenue) market.

I mention these few applications in passing only in that whoever develops this technology (solves the challenge problem) will probably not have to worry about rising gasoline prices. While we at 12 Angry Men would not be able to match the Millennium Prize amount, it would not be unreasonable (hint, hint) for Microsoft or Google to cough up, say, $10MM for the winner. The NSA and intellectual property lawyers are said to have money also. We will throw in a free beer at the Man Lunch providing the winner is local.

The current state of the art seems to be based on tags which are assigned to images by the provider or poster. That the automated search of images is yet an immature field is evidenced by Google’s attempt to entice users into participating in tagging. Polar Rose is another attempt at making a user friendly browser image tagger. To be truly useful, the image must be analyzed on basis of content, not associated metadata. As a start to the problem, the following thoughts are forwarded:

  • The compact representation of the image must be generated in small polynomial time
  • Compact is very small compared to size of original image
  • Representational form may be equivalent to solving eigenvalue problems of high dimension
  • Formats need to be expanded to a common form for analysis
  • Search comparison will likely be multidimensional

I looked at this from the point of view that many images were self-similar and subject to a fractal compression technique similar to Michael Barnsley’s Iterated Function Systems (IFS). In a particular iteration experiment, we were able to create a particularly accurate coastline of Australia using five line segments and a set of iteration coefficients — extremely compact compared to the point set describing the continent. Unfortunately, the rendering could be done in linear time but the cost of deriving the coefficients, the heart of the problem under consideration here, was high. Wavelet approximation also has some potential. Added to this is an observation that it might be more efficient to classify images first before trying to generate a compact representation. Kris Woodbeck is reported to have a process similar to the way humans process images but a quick read suggests it’s more in the line of a classifier rather than reducing the image representation to a searchable context.

Any process is used both at the server side when requested by the browser at submission of the search key image, and in the crawler image indexing process. So far the human eye-brain is the only process that comes close to doing this.

Good luck.

Once upon a time, news trickled out into newspapers or magazines. Then radio brought news bulletins out on a twice or three-times daily schedule. Television merged the fast pace of radio with the graphic content of photographs but didn’t really accelerate things further. Over many years we doubled or tripled our daily dose, but that was about it.

Until cable. With the advent of CNN and Headline News, and all their successors we now had news on an hourly basis. Naturally the Internet would only take that further, with news now literally “on demand.”

So it was only a matter of time until some clever news agency merged various technologies to give us this: a fully embedded, Google map-based, interactive display of currently known hash houses in Florida:

http://www.miaminewtimes.com/php/specialreports/index.php?report_id=791046

Can a full merge of all this with Google Earth be far behind? Will we soon have “breaking news” layers for Google Earth allowing us to zoom in as events unfold? Will Google eventually stream live satellite coverage to allow us to watch police chases and shootouts in real time?

Is there even any downside? (Well apart from the unfortunate inevitability that some poor sap will have his house displayed for national scorn due to a mistyped address…)

Pretty soon will this scenario be not clever fantasy but simply the way it is?

If so, is that good or bad?

Discuss!

John C. Dvorak, a long time main stay of technology magazines, has proven that he is so absolutely out of touch with modern technology and its uses that his future opinions are all now cast into doubt. His recent statements are so off base, that I seriously wonder if he has suddenly suffered permanent mental damage. In a recent column for PC Magazine he declared that the as of yet unreleased Google Phone is already doomed.

Now don’t get me wrong, I’m not saying the Google Phone is going to be a huge success either. It could really go either way. Plenty of good ideas, and big promises have been canceled or utterly flopped. The Google Phone may go that way, but given Google’s track record, I’m inclined to bet that they have something neat up their sleeves. Even some of their less well known products are amazingly useful, albeit less popular.

What really sets Dvorak’s statements from simply short sighted to down right moronic, however, is why he thinks the Google Phone is doomed:

So what is Google trying to do with a phone? First of all, it wants to put Google search on a phone. It wants to do this because it is obvious to the folks at Google that people need to do Web searches from their phone, so they can, uh, get directions to the restaurant? Of course, they can simply use the phone itself to call the restaurant and ask!

Right…, because people only do web searches to get directions to a restaurant, and of course always have the phone number for every location they might want to visit on hand? Obviously Dvorak has been asleep for the past few years and has thus missed Google SMS, which allows you to conduct Google searches with any SMS enabled phone by simply texting the search to GOOGL. Personally I can vouch for the fact that my friends and family all use Google SMS quite a bit. Whether it is searching for a Sports score when you can’t get to a TV or computer, searching for the nearest Asian restaurant in a certain zipcode, or using it as a text based 411, it works beautifully and is amazingly useful. It’s been so useful and so popular in fact, that Google debuted a new voice recognition version of it called Google 411, which works brilliantly, and is completely free (compared to the $1.99 most cell companies will charge for their 411 services).

In fact, Google’s new 411 service highlights an important point. Search engines for phones actually pre-date search engines on the web. So, yes, Dvorak, people DO in fact want to have search capabilities for their phones, and have actually wanted such capabilities long before the World Wide Web had even been a pipe dream. But you don’t have to speculate on the demand for mobile phone search capabilities, studies have shown that the demand is sky rocketing for these features (with mobile phone access to maps and directions topping the list).

Finally, Dvorak needs to realize that mobile phones are changing radically. Between roll up displays, compact virtual keyboards, and unbeliveably small projectors, it may not be long before the phone in your pocket is every bit as powerful and usable as a computer. Based on this information, I think Google is making a pretty safe bet by designing their phone around search capabilities. I’m really not sure where Dvorak is getting his ideas, but I think it is clear from his column that he has grown dangerously out of touch with modern technology.

-Angry Midwesterner


Fellow blogger Pascal has an interesting story running today about how WordPress is using Google Adsense to display advertisements, without annoying its users. From the article:

If you’re a regular reader (let alone poster) on WordPress.com, cookies will prevent you from seeing ads. Regular readers don’t click ads anyway, they’re there for the content. Ads would be off-putting and keep readers from becoming contributors.

But it turns out it is far more clever than just keeping ads out of view of regular users. WordPress also pays attention to how you found out about a specific blog:

Chances are you never visited Kris Hoet’s blog* – Kris is EMEA Marcom man for Msn/Windows Live. Although he has it mapped on his own domain crossthebreeze.com, the blog is hosted by wordpress.com. Yet if I refer you to his holiday report, you won’t see any ads either, even as a first time visitor, even if you delete your crossthebreeze.wordpress.com or crossthebreeze.com cookies (this cookie-killing Firefox extension will save you time).

However, if you land there by accident after a Google search, things are different. You’re quite likely not to be interested by his blog, but more by bars in Kota Kinabalu… The served ads (fitting your search terms even more than the content of the post) offer a convenient click away.

It’s quite an interesting model, and an incredibly smart way to show advertisements, make a little money, and yet still be a good net citizen. Read more at Pascal’s Blog, and from the WordPress Staff on this interesting and innovative idea.

-Angry Midwesterner