Monday, January 31, 2005

A nice list of Open Source databases

Decisions, Decisions... lists a number of them with links, including VistaDB, Cloudscape, and the usual suspects.

Sunday, January 30, 2005

Optimize Magazine > BAM Keeps A Finger On The Pulse

Optimize Magazine > Technology Innovation > BAM Keeps A Finger On The Pulse > January 2005- nice article about what BAM is and how its principles will infuse into everything in a large company.

It also makes the point that although it is specifically for real-time operations, context is the key to the true meaning of real-time events. Expect lots of Data Warehouse input to provide that context.

Analysis Services 2000 vs. 2005

From Mosha Pasumansky: Analysis Services 2000 vs. 2005: "I have run across a presentation called �Analysis Services 2000 vs. 2005� prepared by Jaimie Basilico and Mark Frawley (Jamie works in Microsoft as Senior Technology Specialist in the East Cost, and he is one of the best field people in Analysis Services that we have). This presentation is targeted towards people who are familiar with Analysis Services 2000 and want to come on speed with Analysis Services 2005. I have found the presentation very useful, but not all subjects are covered in the same depth. Below are my comments"

The Bloggers Guide to BizTalk: Releases: Home

Kent Tegels points out The Bloggers Guide to BizTalk. It comes in the form of a downloadable Windows Help file, and all I can say is wow! Very nicely done, and very organized. A very good introduction if you aren't familiar with this product want want a quick crash course, with the ability to get into the areas of detail you want.

SQL Server 2005 - Interface Overview

SQL Server 2005 - Interface Overview from Database Journal. Don't have acces to SQL Server 2005 beta, but want to see what all of the new screens look like? Steven Warren posts a bunch, everything from the new start menu to all the admin screens that come with it.

Firebird and other Open-Source Databases getting attention

Enterprises Warming Up to Firebird Open-Source Database is an article on EWeek that summarizes a recent survey by Evans Data Corp. Among the findings:

  • Of those surveyed, 23 percent of developers picked Firebird for use in "edge" databases—in other words, those that are embedded in systems or in devices, such as a point-of-sale system in a retail outlet or a network device. Runners-up included Microsoft Corp.'s Access, at 21 percent, and Microsoft's SQL Server, at 13 percent.
  • McKendrick said that the survey overall showed open-source databases as having made deep inroads into the enterprise since Evans first started surveying their use, about three years ago. At that time, about 12 percent of respondents were tinkering with open-source databases—a percentage that's up to 60 percent now.
  • Desire to leverage XML access to databases is very high.

Firebird is based on the sourcecode from a Borland product, Interbase, and has some mature features (including triggers, stored proedures, ACID compliance, etc.) Here's a history. I don't know much else about Firebird...

Friday, January 28, 2005

Getting server diagnostics via a web service

My last post on SOA topics today...

In any sort of grid, monitoring the health and load of the nodes is a must and a necissity to perform load balancing and assure redundancy. Doing so can be a challenge though, as each OS has different ways of reporting info, whether it's Linux, Windows, FreeBSD, Solaris, etc. This is the major reason grids today not only have homogeneus OSs on all nodes, but also require special modifications to those OSs at the kernel level, because the capabilities for getting such information remotely is not at all standard, or supported. Gathering the data and storing it for constant analysis by the grid is a necessity, and is used for all the important grid activities, including adding nodes, and determining when one has dropped.

In my goal to assemple a database grid using open, default instalations of common tools as much as possible, I did a little research to see what I could find, and I quickly ran in to some problems. I know I want all communication to be via a web service, which is easy enough, but getting the diagnostics in the first place was not easy. There's no built-in utilities in the LAMP or WAMP stack to grab point-in-time statistics, and I'm sure this is because the methods are very different, even between different versions of Windows, or diffferent distributions of Linux or Unix.

Perl has some sketchy libraries, but they are by no means complete, and require a lot of tweaking, which is not what i wanted. Then I stumbled across http://phpsysinfo.sourceforge.net, which looks like my ideal solution. It works with all BSDs (even OS X), all POSIX compliant OSs (Linux included) and Windows. The output is pretty (this is live off a sourceforge server) and it has XML output built in. Best of all, it requires no configuration - drop the script in and go. No database hooks.

Very nice, and it could be used to monitor a whole group of LAMP servers, all via web services, and it's stable. I'm using this in my project.

It Official: no XQuery for Whidbey, and that's still fine.

It Official: no XQuery for Whidbey, and that's still fine. from Kent Tegels breaks the news. Whidbey is the beta version of the next version of Visual Stuidio .NET. Xquery was way too complicated, making simple SQL statements a mess. The idea of using XML for communication is, of course, not a bad idea, but the best way, imo, is to just adobt the ODBS model. All you need is four elements - hostname, login, password, and SQL statement. That's it. Return back a result set and an error code. That's it.

We already have SQL, vendors already agree on it as a standard, and it works for 99% of everything out there, right? No need to reinvent the wheel.

Slashdot | LAMP Grid Application Server, No More J2EE

Slashdot | LAMP Grid Application Server, No More J2EE points to ActiveGrid, a LAMP grid implementation that sounds very cool indeed.

The comments in the slashdot article (I filter to comments rated "5") are also very insightful, as they usually are, as lots of people share their opinions about J2EE, LAMP, etc.

The inventor of Activegrid has a blog of his own with some very insightful articles. He used to work for Sun pushing J2EE for awhile, so he has some good points. I learned alot reading his entries, and I really like the way he put together the future vision of the "application" server, as a distributor and consumer of SOAP and XML, which, soon enough, is destined to become the lingua franca of application apis and data access. Databases are not excluded from that vision. It makes an enourmous amount of sense to me.

Database Benchmarking - How it's done

Wow what an informative article. Database Benchmarking on Database Journal is written by Steve Callas, an Oracle DBA, and covers all of the major benchmarking tests performed by The Processing Performance Council, or TPC, and explains why you have to take all of the results with a grain of salt. Great background for the alpha geek that explains the nuts and bolts behind the popular benchmarks.

The Florida Education Data Warehouse GIF

The Florida Education Data Warehouse GIF from the B-EYE Network. It's a nice write up of a failed implementation of a data warehouse, which was saved by an implementation of Bill Inmon's Government Information Factory, with a nice color coded picture.

The GIF is a data warehouse with some special considerations for government uses (like more of a focus on the ability to merge new data, as in from other agencies). More info can be found in the results of this Google search.

Thursday, January 27, 2005

Win an iPod in the Cloudscape Challenge

Just got this in my Sourceforge update...

Only a few days left to test your Java skills.. We are giving out ten
40GB iPods and 50 (count 'em) SourceForge.net T-shirts. Entries have
to be in by the end of January 2005. Cloudscape, by IBM, is a
powerful Java-based SQL database that sports a small footprint (a few
megabytes). Since it's Java based, it's cross-platform. IBM recently
open sourced the database under the name Derby. The contest is easy --
simply download the database and our special data files and answer the
question we email you (it's a simple SQL exercise). If you are able to
get the correct answer from the datafile you'll be placed in a drawing.
What are the odds? Likely pretty good. It depends on how many people
enter the contest and have the right answer. Only a few days left.
Enter now.

Note: You have to be a U.S. or Canadian (except Quebec) resident to
enter (sorry, blame our legal folks). To enter go here:
http://sourceforge.net/cloudscape_contest.php

Cloudscape is an open-source database based entirely on java designed for mobile use. More info on it here.

InfoWorld: New worm targets MYSQL installations

InfoWorld: New worm targets MYSQL installations. It was bound to happen sooner or later:

The new version of Forbot infects machines by taking advantage of administrator accounts with weak or nonexistent passwords. The worm cracks the accounts by trying values from a predefined list of around 1,000 possible passwords, Ullrich said.

........

To be infected, MySQL has to be configured to allow the root account to log in remotely to the system. By default, the root account is only allowed to log on at the machine running MySQL, rather than remotely. The root account also has to use a password that is on Forbot's list of passwords, Ullrich said.

So there you go. It only scans on port 3306, and only affects the Windows version of MySQL too.

Partitioned Tables and Indexes in SQL Server 2005

Christa Carpentiere points to Partitioned Tables and Indexes in SQL Server 2005, a needed and welcom feature that could be part of SQL Server 2005. SQL Server has never really had such a feature, an important one, and a concrete reason why Oracle is often a better option for very large databases. Of course, you could fake it with filegroups and views, but that's just what it is - faking it.

Why do I say "could"? Because of the disclaimer at the top of the article:

About this paper The features and plans described in this document are the current direction for the next version of the SQL Server. They are not specifications for this product and are subject to change. There are no guarantees, implied or otherwise, that these features will be included in the final product release.

Well alrighty then! I'll get right on that sample code.

Wednesday, January 26, 2005

Grid Computing Takes the Linux Route

That means the open source route: Grid Computing Takes the Linux Route:

Monday's launch of the Globus Consortium by HP, Intel, IBM and Sun Microsystems represented the second body devoted to the commercialization of grid to come into being in the past year, after the Enterprise Grid Alliance launched in April.

Why do we need yet another grid outfit? Besides the EGA, we already have the Globus Alliance, as well as a smattering of bodies that work on grid standards, including the Global Grid Forum, OASIS (Organization for the Advancement of Structured Information Standards) and the World Wide Web Consortium.

The Globus Consortium, however, is specifically devoted to advancing open-source implementation of grid standards as the world of grid opens up to commercial use. The group is focused on advancing the Globus Toolkit, an open standards building block for enterprise-level grid implementations that came out of the Globus Alliance, an open-source-focused organization at Argonne National Labs.

Good idea. Open standards in this area will open up a few markets for all the tangential software that will be associated with any grid, such as management and monitoring tools. But even if it is best for customers, it doesn't mean this is the strategy that will win. It's way too early to tell.

Tuesday, January 25, 2005

Connect to Lotus Domino using SQL Server Linked Server

What a great article, as I was just wondering how to go about doing this. Connect to Lotus Domino using SQL Server Linked Server is worth a bookmark, as Lout Notes is very common in enterprises, but gaining access to it's data can be quite a challenge, and usually involves third-party tools of some kind.

Technical Comparison MySQL 4.1 vs. Microsoft SQL Server 2000 - from Only4Gurus.com

Technical Comparison MySQL 4.1 vs. Microsoft SQL Server 2000 - from Only4Gurus.com is a pdf report prepared by A23 consulting. It is pretty unbalanced (claiming for example that MSDE is a perfectly appropriate free version of SQL Server, apparently interchangeable with SQL Server for all intents and purposes, thus canceling out any "free" advantage MySQL may have) but an interesting read nonetheless. It is very, uh, gratuitous.

Oracle Business Intelligence Spreadsheet Add-In Available for Download

Lucas Jellema notes that � Oracle Business Intelligence Spreadsheet Add-In Available for Download:

With this add-in, it becomes much easier to make use of the OLAP functionality in the Oracle 9iR2 and 10g database. Wizard-driven from Excel, end-users with only a little training should be able to perform analysis that otherwise would require the use of custom built applications or the configuration of Analysis tools such as Discoverer.

Since Excel is by far the most popular tool for BI consumers, this is an important addition.

Monday, January 24, 2005

XML Web Services for Database Access

The good thing about trying a new project is, much like writing an article or teaching a class, it forces you to research alot of things you thought you knew well.

One thing I want in my grid solution is a connection mechanism that can be redundant, and use some common standards. Native drivers can be buggy on some platforms, and ODBC is out - i don't want to have to force the installation of a new driver. The solution I want to use is a plain old SOAP webservice. Turns out, database vendors realized the power of this approach too and DB2, SQL Server, and Oracle (from 9i on) all support a SOAP interface. Can XML Web Services Offer a Standard Across Databases? is the best article I've found on the methods, all of which are addons. DB2 requires the DB2 XML Extender, SQL Server requires the web services toolkit, and Oracle requires the Oracle XDK. Oracle 10g and Yukon have integrated support to an extent, but it is still rarely used.

Unfortunately, while the transport protocol is the same for all of them, the methods are all different. And XQuery is just a huge pain. It's writers apparently have no respect for simplicity and it is much, much more trouble than it's worth, especially when every database supports ANSI SQL for the most part.

But the SOAP advantage is too big too ignore (namely, every platform, and every language, has built in ability to access it), so SOAP it will be for access. The XML elements will be much simpler than any of the big venodr solution sthough. In fact, I'll model it off most native drivers - host, username, password, and query. The result will be a dataset, with error codes if necessary. Nothing fancier is needed I don't think. This approach also allows me to substitute in other databases to provide an even more motley redundant cluster.

Update: Tom pointed out in the comments that "SQL Anywhere from the iAnywhere subsidiary of Sybase provides web services (both as provider and as consumer) as part of the base product, and that the services can be defined entirely within the database, and that a free developer edition download is available from ianywhere.com." Thanks Tom, I did not intend to treat Sybase like an unwanted stepchild.

Sunday, January 23, 2005

The Need for Better SQL Server Backups

The Need for Better SQL Server Backups covers something normally a yawner of a topic, and it would seem to be far too entry-level for DMReview, but this article has some gooc points alot of similar articles don't cover, including using a third-party tool to speed up and compress backups (saving alot of space - my favorite is SQL Litespeed), and a good argument of the need for other third party tools for backup management, as Enterprise Manager doesn't cut it for a large number of servers.

Saturday, January 22, 2005

SQL 2005 Webcasts - Q&A

Christa Carpentiere has the link to SQL 2005 Webcasts - Q&A. Lots of links to webcasts there, plus Q&A transcripts, covering everything from full text search to reporting service to notification services. Lots of useless info in the Q&A but a few interesting things, such as there might be a query analyzer for MDX in the next build of SQL Server 2005.

Google hiring Oracle DBA for Business Intelligence Apps development

Just popped up on my Feedster feed. Never let it be said Google is solely a MySQL shop :) Google: Oracle DBA (Business Intelligence Apps)

Google is looking for an outstanding Oracle DBA who will play a major role in the development of business intelligence applications. This is a unique opportunity, in that we are in the early stages of forming a team. The person hired into this role will wear many hats: database architect, capacity planner, backup and recovery architecture designer, performance tuner, and of course database administrator responsible for production and operational support.

Friday, January 21, 2005

Business Intelligence Portal Sample Application for Microsoft Office

Marco Russo points to Business Intelligence Portal Sample Application for Microsoft Office, which is a new release. This actually looks very cool (screenshots are in the docs).

I'll be working on a project for a client soon where I may be able to incorporate this. I'm looking forward to seeing what it's capabale of.

Thursday, January 20, 2005

An open source database grid

I attended an Oracle Tech Day yesterday in my city, and one of the sessions I went to was the Oracle 10g Grid presentation. I already knew much of the information, and I thought about a few things I've posted on recently, mainly about the differences between white-box clustering (used by Google, Yahoo, Microsoft, and others) and what I'll call traditional, shared sapce, SAN dependent, enourmously expensive solutions. Both require a level of technical knowledge just way too high for most. Vendor solutions are very expensive, but the white-box, commodity approach has to be customized to the app in question,as it usually involves multiple partitions over many boxes.

That got me thinking, what if a grid solution could be made that was simple? Using easy-to-administer, common software? The benfits of a grid, according to Oracle, are:

Performance
Scalability
Reliability
Security (?)

It delivers "High Quality at a Low Cost." Is it possible to do this with a straightforward, easy to setup system? That has the ability to add-in/drop-out nodes on the fly, redundancy of data, and a transparency of the architecture to the user?

Ideally, a simple cluster would be cool if it used things in this list:

LAMP architecture (or WAMP)
Perl for automation/integration
No other software required on a default OS install
No ties to any hardware dependencies
No OS dependencies
No special OS tweaks/custom features required (default install)
Use of all default settings wherever possible
Standard interface for access

There are plenty of technical issues to consider, such as, what scheduler to use? (Cron or at, with a single script that manages the node) What interface? (Why not SOAP?) How to handle Ip changes when a node goes down? (Hmmm i'd go with some round robin DNS for now.) What about master nodes? (Every node should be a master) And many more.

And obviously, I think I have some answers, and I'd like to give this a try. I know enough about the pieces, and about commercial solutions and documented white box architectures, to assemble an easy POC of some kind, and it might be fun. I'm going to do things a little differently than other projects out there, based on some lessons I've learned, for example, I'll use a simple hacked business rules engine, and drop tasks in a task stack to be run. And have a nice little web-based dashboard at each node with lots of feedback and the ability to administer it rather easily.

The main goal will be to meet the conditions in the list, with a focus on Reliability and Scalability. It will be to provide a redundant enough architecture to provide the same uptime as an enterprise SAN, but with all of the cheapest hardware possible. Performance of selects may be a side benefit (because they can be distributed across multiple nodes) but won't be a focus.

I can already see some limitations, for example, this will be a solution for hosting multiple databases, but very large ones (say, ones > 10-50 GB), mainly because my initial solution won't distribute tables across nodes, but whole databases. And inserts/updates may take a hit, as multiple instances will have to fire off.

But I think it could be a good project, it's been a couple of years since my last open source project.

PostgreSQL 8.0 Released

This has been noted at a few mainstream outlets already...PostgreSQL 8.0 is the latest version of this open-source database, largely considered a competitor to MySQL in the open source world. It's claim to fame in the past was that it had alot of advanced features, including triggers and stored procedures, unlike MySQL.

It's main drawback is that it was developed and uniformly deployed on *nix platofrms., and did not work well at all on Windows (probably because it required Unix emulation to install and run it.) This is the first release that can run natively on Windows, but as noted, in the What's new notes, but NOT in the press release:

Although tested throughout our release cycle, the Windows port does not have the benefit of years of use in production environments that PostgreSQL has on Unix platforms and therefore should be treated with the same level of caution as you would a new product.

This release does have a standard windows installer, which is a plus. Other improvements include savepoints in transactions, full tablespace support for the physical model, and Perl Server-side language (cool!).

Wednesday, January 19, 2005

What I look like

For anyone reading this feed, via a browser or (more likely) on RSS, you may wonder what I look like. You'd know I like databases, and business intelligence, own my own business that provides those services, and have done some work for some first class clients, plus I'm an open-source advocate, IM bot creator, not to mention I have a technical blog, so you may have developed a mental picture of me. So, compare your mental image with a pic:


This was taken this past Saturday night with my band, which plays some punkish/grungish/metalish music. But those are my work clothes, I wear suits, and look much like this, all week long (but without the guitar, and a much nicer belt). Needless to say, I and my business partner, who with his bald head and goatee can look menacing, can turn a few heads when we walk into a company for the first time.

Online Oracle Ware Builder 10g training

Here's an online guide to OWB 10g. It's hands-on, allows you to install all the software on your machine, and takes about 4 hours. Can't turn down some free training.

Tuesday, January 18, 2005

Migrating Access 2000 Databases to SQL Server 2000

In any large company, technical people often do their own solutions to get problems solved quickly. A very, very common case is an engineer or CPA in production or finance that learns just enough Access to whip up a nice little automated solution for something(because they couldn't get anyone in IT to listen to them). But very quickly, the little access program becomes indispensible to daily business. More things get added on, and before you know it stores over a Gig of data and is so slow it takes 10 minutes to run a query.

A quick way for a DBA to come to the rescue and improve performance, protect the data, etc, is to move the data into SQL Server. This is common enough that since Access 2000, the utility, the "Upsizing Wizard," is built in to the program, and it handles everything - exporting the data over, building indexes, and reconfiguring links to point Access to the new tables. Here's a few links on using this high impact tool:


SQL Server 2000 vs. DB2 videos

Just noticed this, although they are over a year old...Microsoft SQL Server: SQL Server 2000 for the IBM DB2 Customer Kit.

Makes some good points about DB2, mainly that the DBA skills required for an AS/400 installation are quite different than when it is run on Intel, and different again for Unix/Linux (all three environments are on different codebases.) A nice watch if you have the time and you are all caught up on Lost.

Monday, January 17, 2005

MySQL CEO: Open source & MySQL will rise, legal foes will fall

Slashdot points to an interview with the MySQL CEO: Open source & MySQL will rise, legal foes will fall:
"What challenges do you see facing businesses that are going to start using more open source software in 2005?

Mickos: We deal a lot with enterprise customers, and we ask them what problems they foresee and what questions remain unanswered. Their No. 1 concern is training the staff. They are asking themselves whether they need to retrain people or whether they have the skills in-house already.
The good news is that most corporations discover, when they ask around, that they have open source skills in-house. That is an important milestone for the open source movement. Many corporate IT people have used open source products at home or, sometimes secretly, in business projects.
Of course, formal training may still be needed. That is the big hurdle that large organizations need to jump as they adopt more open source."

Definitely a good point, which I think some execs miss - chances are there's a good bit of OS experience already on staff.

New Oracle Business Intelligence 10g Training Resources

Mark Rittman points out New Oracle Business Intelligence 10g Training Resources:

The lessons use the Oracle Business Intelligence samples that you can additionally download from OTN, and take the form of simple steps and screenshots to walk you through creating workbooks, analyzing data, building reports and so on. I've worked through them myself as a starting point for putting some demos together, and they're a pretty comprehensive look at the new features.

Take a look through Mark's blog, he has alot of information about Oracle BI 10g lately.

Sunday, January 16, 2005

High Availability and Scalability Enhancements in SQL Server 2005

High Availability and Scalability Enhancements covers some of the features SQL Server 2005 will support, including real, no-faking table partitioning, and support for adding and detecting physical memory on the fly without shutting down.

Thursday, January 13, 2005

Business 2.0 :: Mark Cuban's End Game

Business 2.0 :: Magazine Article :: In Front :: Mark Cuban's End Game:
"Even more heretical is Cuban's opinion of DVDs, which is that they suck -- or, at least, that they're inferior to hard drives as a medium for storing digital content. 'Why would we invest in DVD,' he asked, 'knowing that hard drives are going to grow in capacity, shrink in size and price, and can also be erased and rewritten?' He imagines selling HD movies stored on key-chain drives -- or putting multiple films on larger drives, 'like software used to be packaged on PCs.' Moreover, he added, 'with ever-expanding storage, we can increase picture quality for years to come by taking advantage of new cameras and better compression schemes. With DVDs, we can't.' "

I find that idea very cool, and it makes alot of sense (especially since I store alot of TV shows and movies on my PC). One technology completely outpacing another, and infinitely more flexible to boot. Mark Cuban talks about this stuff all the time on his blog.

The New York Times > Young Cell Users Rack Up Debt, a Message at a Time

The New York Times > Technology > Young Cell Users Rack Up Debt, a Message at a Time talks about one of my favorite subjects, instant messaging.

Text-messaging has flourished for years in Europe and Asia, where it is immensely popular among young people. In the United States, activity was limited until 2002, when a breakthrough in the wireless market allowed short text messages to be sent among customers of the major cellular carriers. Previously, customers could send messages only to those who used the same carrier.

The service, known as S.M.S. (for Short Message Service), has since taken off. According to a recent report from Forrester Research, a company in Cambridge, Mass., that specializes in technology, Americans sent 2.5 billion text messages a month in mid-2004, triple the number sent in mid-2002.

People unfamiliar with SMS can't see what the big deal is, when you can just pick up a phone. There's something special about it though, in that you can discreetly send a short message, and can read them quickly without interrupting other things you're doing. For me, I know it can make my whole day receiving one in the evening from Australia on the other side of the world.

If you ever wanted to spot a huge trend on the way, this is it. Integration with other services is everywhere for those that care to look (including email gateways and IM services). I doubt growth in the US will stop until it is around the level it is in Europe and Asia.

In the work world, a first step I recommend is setting up an SMS alert for some event (using the cell provider's email gateway). Here's an SMS alert I set up once for a client:

One thing I really dislike about my cell carrier's network (Nextel) is you can only receive SMS, not send. They should figure out how to fix that.

Wednesday, January 12, 2005

SQL Server 2000 SP4 Beta

Simon Saban points to a signup page for the SQL Server 2000 SP4 Beta. SP3a has been the standard for about 2 years now and is generally considered pretty stable, but there's over 200 bug fixes in SP4. The only addition I spotted related to new diagnostic settings for identifying some problems.

As always, don't put this on an installation you actually expect people to use!

Tuesday, January 11, 2005

Advances In Video/Multimedia Search

Sadagopan's weblog on Emerging Technologies,Thoughts, Ideas,Trends and Cyberworld points to this Wired article about indexing pictures and video.

A group of European researchers are developing technology that can recognize everyday objects in digital images. The image-processing software looks for "key patches" in an image to determine the relative positions of different shapes, such as tires and a car body, or a beach and ocean waves, to categorize the image's contents.The software has learned hundreds of objects since development began in 2002, and can be used to categorize images and automatically create image tags. The software can look for images similar to those it has already scanned and "knows,". The software is currently being tested on a variety of images, and the researchers continue to add new object categories. Companies such as clothing stores or sporting goods companies would jump at the chance to have a Google image-search result in pictures displayed with their products.

IBM's Pervasive Media Management group is developing visualization software that can identify objects contained within one of the web's fastest-growing content categories - video streams. The software identifies groups of objects within a frame to form concepts that can be easily searched, such as an airplane with a cloud and sky backdrop that would be categorized as travel.Categorizing the content of video through human labor can take 10 times as long as the duration of the content, as per IBM. The software can be trained to recognize images by providing it with a group of similar images.IBM is working with broadcasters CNN and ABC to identify concepts that can be used to classify news footage.

A faster payoff may be marrying voice recognition with audio and video archives to produce relevant transcripts. What is it to us? Another data source worth indexing and aggregating for use in data warehouses and analytics applications and integrating with other data in the enterprise. First obvious use might be for intranets (such as being abe to search through every recorded meeting on IT projects) This will be huge when the technology happens... be the first on the block to give this technology a shot!

Interview with IBM's VP of Worldwide BI Solutions

BI Solutions, Products on Tap:

What, specifically, are IBM's deliverables for business intelligence?

We typically lead with our P-series hardware. ... And we will lead with our Fastkey storage. We have a software platform that we've been evolving called Data Warehouse Edition. And underneath the covers of Data Warehouse Edition is DB2; DB2 CubeViews, which is our metadata bridge; there's the Information Integrator, which allows you to get to heterogeneous sources of data; there's the Intelligent Miner capability, which allows you to do scoring and analytics; and there's Warehouse Manager, which is rudimentary ETL [extraction, transformation and loading]. If you don't have ETL today, we'll give you some ETL capabilities. And there's actually a capability called OfficeConnect, which allows you to use spreadsheets as the presentation layer of the Data Warehouse Edition.
.....
How would you say IBM's approach to BI differs from Microsoft [Corp.]'s and Oracle [Corp.]'s approaches to BI?

We don't believe in the one-size-fits-all strategy, which I think differentiates us from our competitors. I also think what differentiates us from our competitors is the fact that we will deliver all the capabilities as an integrated package. We deliver the server, we deliver the storage, we deliver the services. Our competitors can't do that. They're software companies. They must partner.

Daily, Weekly, Monthly DBA checklists

Nimzo Benoni runs a very technical Oracle weblog. A daily, weekly, and monthly checklist of tasks might be pretty basic for some, but its surprising how many DBAs haven't thought it out. A good reference.

Monday, January 10, 2005

Metrics Development: Taking It From the Top

The best and most insightful analytics project in the world can fail without properly understanding the business perspective, and without support from senior decision makers. Intelligent Enterprise Magazine: Metrics Development: Taking It From the Top makes this point well and has some great advice:

IT managers who head up dashboard projects must get their business counterparts heavily involved in KPI development. The more senior the collaborative team, the better. If CEOs or COOs aren't actively involved in KPI development, at a minimum they should sign off on the finished KPI scorecard. The ideal process is to create a senior cross-functional team that touches all major departments; the team should meet regularly until a consensus scorecard emerges.


Several factors increase the likelihood of success. First, going into these meetings, all participants need a clear (and consistent) understanding of the organization's high-level strategies and goals for the short and long term. If these strategies and goals don't exist, end the meeting immediately and inform the executives nearest the top that they have to provide them. There is no substitute. That's why many people strongly believe that executive sponsors are the true key to performance management success.


Second, team success depends on the creation of an open, nonpolitical atmosphere where members can discuss candidly what should and shouldn't be measured. Your own experience has probably shown the difficulty of setting such an atmosphere. Imagine this scenario: The people in the room may ultimately be judged (and potentially compensated) on how they perform against the measures the team is selecting — which is why a third key success factor is frequently obtaining outside help. Running these sessions can be especially difficult for a person inside the company, with the possible exception of the CEO or COO — someone who rarely has the time to head up the team sessions.

People learn this quickly one way or another, most learn it the hard way :) I've seen plenty of internal projects skip this step based on assumptions about people's needs (including some of my own), and in doing so increase the risk of failure dramatically. Making sure about the team approach is used at the very beginning of a project increases the time to get it done, but also virtually assures a much higher degree of success than would have been possible otherwise.

BPEL (Business Process Execution Language)

INETA Pakistan points to BPEL (Business Process Execution Language), or "Build your Business Apps with BPEL." (requires registration) Written by the director of server technology at Oracle, it's a good overview of how to get started using BPEL when developing applications. Goes over some of the important classes of tools to get started, like XML tools, web service tools, etc. The background for the article is Oracle BPEL, but the princples are, in true cross-platform fashion, applicable to any environment, whether you use Oracle or Microsoft or Apache's opne-source tools.

BPEL has an excellent potential for becoming a vital piece of the enterprise architecture in the future, so its useful to understand how all of it works, and this is a good introduction.

Friday, January 07, 2005

Microsoft tries white-box storage - TerraServer Bricks, A High Availability Cluster Alternative

TerraServer Bricks, A High Availability Cluster Alternative - from Only4Gurus.com is a Microsoft Research paper on Only4Gurus that describes how Microsoft is experimenting with Google-type architecturesto eliminate high-priced SANs and replace them with commodity components used by the TerraServer project. The TerraServer project is known to many SQL Server customers as it is often used as an example of the storage capabilities of SQL Server. The old Single SQL Server node in Active-Passive cluster connected to 18 TB of data, and was enourmously expensive, complicated to administer, took 30 seconds+ to failover, and required a backup tape library (the SAN was too expensive to provide a duplicate copy.)

From the paper:

In many enterprise applications, a SAN’s high cost and complexity can be tolerated because of the ROI the application provides to the organization. However, most internet applications have razor thin profit margins. It is difficult if not impossible to host a profitable internet business on SAN hardware. Yahoo and Google give good examples of this. They buy very low-cost hardware configured redundantly to achieve high availability. They do not depend on system software or hardware components to handle failure cases. Instead, they program “around failures” at the application or in the “middle-ware” that their staff implements. As a result, they have very high application availability implemented and deployed at a very low cost.

In contrast MSN and many Microsoft customers have traditionally deployed SQL Server, and Microsoft clustering applications that expect the underlying hardware and system software to handle failure conditions transparently to the application. This is changing, MSN search has a brick design, and MSN Hotmail is making the transition from expensive backend SAN servers to commodity servers similar in design to TerraServer Bricks.

The groups of servers are called "RAPS," or Reliable Array of Partitioned Servers and use regular commodity hardware (including SATA hard drives) and uses SOAP web services built in C# to communicate among the nodes. The paper goes in to great detail with code, back up plans with scripts, testing methods, and detailed statistics of what parts failed when, and how big an affect they had on the rest of overall application. The conclusion?

Three-year TCO has gone from $3.3M to $0.5M – a 6-fold cost reduction.

That takes in to account a larger charge for hosting. The bulk of the savings is in hardware and software costs, even though "The software license prices [for the Bricks] are list prices assuming no discounts, no enterprise agreements, or other
special pricing. In practice most customers would pay lower software prices."

The TerraServer Brick architecture, server equipment purchased from Silicon Mechanics, and the SATA disk technology has exceeded our expectations in every aspect.

We already knew that SATA disks and white-box PCs could meet the performance requirements because of testing done in October 2003 [Barclay03]. We were
frightened into thinking the failure rate of the SATA disk drives would be 100%. The actual annual failure rate has been 6.4% which is reasonably close to the 5.5% SCSI
disk failure rate. The SATA drives combined with the reliability and performance of the 3Ware RAID controllers are formidable competitors to SAN technology
at a fraction of the cost.

We expected the “white-box” servers to be less reliable and the service to be worse than what we received from Compaq (now HP) on the SAN Cluster. We had a
handful of reliability issues15 and excellent service from Silicon Mechanics that so far is on par with the experience we had with Compaq for over five years.16
We also experienced zero “blue screens” or other unexplained system crash. Actually, we didn’t experience any issues with the system software or hardware that
resulted in a system crash.
...
In summary, we conclude, like Yahoo, Google, MSN Hotmail, and MSN Search, that commodity storage and servers are the price/performance choice for high-volume
web applications. While we loved the TerraServer SAN Cluster and its ability to detect and handle failures transparently, the price, performance, and reliability
benefits of the TerraServer Bricks configuration outweigh the costs of implementing failover and redundancy logic in the application. We expected to find limitations
and missing features in Windows, .NET, and/or SQL Server that high-availability web sites would need to deploy Windows and SQL Server on commodity servers.
We were wrong. Windows 2003, .NET 1.1 and SQL Server 2000 have all the engineering robustness and features required for users to deploy highly-available and
high-volume web applications with little additional investment in application development.

Many papers have floated around for a couple of years aabout the Google architecture, this is the first where a group in Microsoft has got the religion. I didn't know they were already taking this approach with Hotmail and MSN Search. Will Microsoft productize this? The value is too large to ignore for large data stores.

Another question would be, exactly what does SQL Server add to the solution? In this scenario, would MySQL serve just as well?