Monday, May 31, 2004

HP certifies MySQL

HP expands open-source support | CNET News.com: "Hewlett-Packard is set to announce plans Tuesday to expand its support for open-source software.
The computing giant will certify and support MySQL, the leading open-source database program, and JBoss, a popular Java-based application server, on HP's industry standard servers. "

Ignore at your own peril. MySQL is something all DBAs should be trying to get some hands on experience on. Since it's free, there's no excuse not to. If you are in the right position, the time will come, and soon, when you will be asked what MySQL can do for your company, and if there is really a need to continue paying high license fees for each and every database installation when there is an alternative that is free. When that time comes, make sure you don't base your position on press releases from the big three.

Friday, May 28, 2004

Indexing Audio Conversations

Get Real has a writeup on an HP Research project called Speechbot. From the article:

John Dowdell points to an interesting research project being conducted at HP Labs, the SpeechBot. As the site describes, "SpeechBot is a search engine for audio & video content that is hosted and played from other websites".

Digging a little deeper into the technical documentation for SpeechBot, I came across this summary:

SpeechBot (http://www.compaq.com/speechbot) is the first Internet search site for indexing streaming spoken audio on the web. Unlike previous attempts to index spoken audio on the Web, which have relied on either adjacent text, metadata, or hand supplied transcripts and close captions, SpeechBot uses automatic speech recognition technology to transcribe and index documents that do not have transcripts or other content information. The use of speech recognition permits the efficient and cost-effective indexing of thousands of hours of audio content, which were previously inaccessible. Because of this indexing, SpeechBot allows users to quickly search for relevant content in long audio documents and yields a high precision on first page-retrieved items.

Read more.

It's not often I get to discuss my two my favorite subjects in one post, but this could have some interesting poetential. Already, through the use of products like OneNote it is possible to take notes during a meeting while recording the audio, and have the audio time indexed to your notes. When you search your notes for a keyword, you can start the audio to hear exactly what was being said at the very time you took a note. Perfect for project meetings, and very useful on a personal level for conference sessions. Also, many financial companies record phone conversations already.

What if it was policy to record such audio? And one step further, policy to autmagically index it? And then archive it as an unstructured data source into a data warehouse or an enterprise document repository? The capabilities are not so far fetched. Querying an ODS and finding an index of audio conversations (or transcriptions) alongside historical support requests would be quite valuable in the right context. Not to mention other uses such as project technical meeting archives indexed for technical documentation purposes in an IT data warehouse.

An Architect of any persuasion would probably find such a searchable archive very valuable. Combine it with some BAM features and perhaps he could be alerted whenever someone uses the word "enable."

Oracle ETL the right way

Mark Rittman has a review of "Oracle DBAs Guide To Data Warehousing And Star Schemas", and some excellent commentary on Oracle ETL, with detail and examples on external tables, pipelined table functions, and the MERGE command that demonstrate how to perform a complete ETL function without staging tables or using SQL*Loader.

The theme is to use native Oracle tools and show there's little need for high-end ETL tools in an Oracle environment. This is a given of course, but to me, there is value in having an ETL process independent of the back end. Every major vendor, and also MySQL, are introducing major improvements in the RDBMs shortly, and there's a risk in throwing one's hat in with any single vendor. And when there's a chance to save 6 figures by migrating the data store, it is a good option to visit from time to time. More on that theme in a later article.

On a much more elementary note, I stumbled across a good introduction to Oracle Data Warehousing concepts,
A Practical Guide to Data Warehousing in Oracle, which is actually Part 5 in a series.

Also, Mark Rittman also announced the start of an Oracle 9i/10g OLAP Mailing list.

Thursday, May 27, 2004

True Tales of Performance Management

Intelligent Enterprise Magazine: True Tales of Performance Management notes that dashboards are an important tool in the presentation of BI metrics to decision makers, and uses some real world anecdotes.

Dashboards seem to be a theme lately, with a good series of columns at DMReview also.

The coming OS Wars

Yahoo! News - Kill Bill predicts that the battle has just begun between IBM, pushing Linux, and Windows.

How is it that for eight months a team of up to a dozen IBM consultants has been toiling in the data centers and computer rooms of the Munich city government--free of charge? Having goaded Munich into embracing open-source software, IBM is helping it plan a migration of 14,000 computers off Microsoft Windows and onto the operating system known as Linux (news - web sites). Never mind that IBM doesn't sell Linux, which is distributed free. And never mind that Munich officials say they're not committed to buying IBM hardware or consulting services, despite all IBM's free help.

one statistic mentioned is that in 1986, when MS went public, it had a market cap of less than a billion, compared to $93 billion for IBM. Today, MS stands at $280 billion, compared to $146 billion for IBM.

b-eye-Network.com

b-eye-Network.com - The Vision for BI and Beyond - I have no idea exactly what this is, but I just received an invitation to it (apparently through a Bill Inmon e-newsletter). There's a short list of recently published articles here.

I am not quite sure the site is ready yet for public consumption (based on pages like this) but there are a few other interesting articles scattered around the site including:

Among many, many others. Beware that everything is written by Bill Inmon himself it appears, so it's full of statements like "Unfortunately, people buy sizzle without recognizing they need a griddle and a fire." (From the Analytics article)

Monday, May 24, 2004

Distributed Data Warehousing With Microsoft SQL Server 2000 and Windows 2000 (2003) Datacenter Server

Shop Talk Discussion: Distributed Data Warehousing With Microsoft SQL Server 2000 and Windows 2000 (2003) Datacenter Server - Level 200, from Microsoft is an archived webcast that look spretty good:

"How scalable is Windows 2000 Datacenter Server and Microsoft SQL Server 2000? This webcast describes Microsoft�s own SQL Server 2000-based customer relation system for its 150 million customers. This system can load over 2 million customer records each day into the master database running on a 32-way ES7000 from Unisys, and distribute this data to systems running on 8-way computers. Encompassing 550 hard drives, and nearly 3 terabytes of customer-related data in SAN storage, this system demonstrates the scalability of Windows 2000 Datacenter Server, Microsoft SQL Server 2000, and the Unisys platform in both a scale-up and scale-out architecture."

Friday, May 21, 2004

Outsourcing the DBA not a good idea

From Database Trends and Applications:

Database Trends and Applications: "Outsourced DBAs halfway across the world cannot easily solve problems, lend guidance, or participate on a team development project. Programmers typically rely on DBAs to be resident database experts and look to them to help resolve issues quickly. This is problematic at best when the DBA does not work in the same location and speaks a different native language. Also, the outsourcer may not treat governmental regulations with the importance as workers in the country where the regulations originate. Outsourcing can cause your company to be out of compliance with government regulations, resulting in fines or incarceration. And what happens if a disgruntled outsourced DBA destroys your databases before quitting? "

Thursday, May 20, 2004

Two free (and unique) data visualization tools - Heatmaps and Graphviz

Any end user will tell you that having a finely tuned and designed warehouse of data is all good and fine, but making sense of it all is a challenge in itself, and people are always interested in new and novel ways of seeing large volumes of data in context. These are two rarely used tools that are good to add to your front-end toolbox, especially for dashboard and data mining hackers:

Heatmap Builder is a free tool from Stanford Labs to build a Heatmap (see pic) from a data set. Heatmaps can look a bit confusing at first, but where you get alot of value is in comparing maps, either from different data sets or different time periods, as it allows you to instantly spot differences. Spits out a jpg, gif, png, at any size you choose.


GraphViz is a very cool mapping and relationship-spotting tool from ATT Labs. Some people have used this on email histories in a company to determine how groups work together and to identify the "connectors" between them. Applicable to probably much much more.

Examples:


From "Graphing Perl".

Here's a Graphviz tutorial to get a quick start.

Wednesday, May 19, 2004

IM for DM

Greg Morey from SendTec gets it. Lots of good facts about instant messaging....

"There is some supporting research to suggest that companies' successful use of IM will only get stronger than it is. According to The Radicati Group’s research, IM use will grow exponentially by 2007, with 349 million IM corporate accounts sending many of the one trillion plus IMs projected to be sent every day."

"Like it or not, IM is the new phone in the home. As a parent, I am completely frightened, but as a marketer, I am inspired to find innovative, effective uses of this robust communication tool. The marketing challenge boils down to creating an IM campaign that is cool enough to engage interest, prompt discussion that will scale, and drive an action yet also maintain responsibility and accountability."

"According to The Radicati Group, 62 percent of the 590 million worldwide active IM accounts belong to users under the age of 30. Heck, 34 percent are under the age of 20. If IM is the next home phone and email has presumably skipped this generation, then how are you prepared to responsibly generate a response from this demographic?"

Does your company have a plan here? If they don't, they will in a few years, the difference is by then they will be playing catch up. Getting started is easier than you think... experimenting is a cinch and can be done from your desktop with the SDBA Revolution IM Application Server. (Yes, I'm the author.)

Big Banks get one-year reprieve on New Basel 2 rules

Business Report - New Basel 2 rules 'take account of small businesses':
"... applying the rules worldwide was delayed. The biggest international banks will wait an extra year, until December 2007, to implement the Basel 2 accord, which regulates how much banks must keep in reserves to cushion against unexpected losses.

The delay helped the US, where regulators had warned that the rules were far too complicated to apply by the December 2006 deadline. But Europe plans to go ahead, starting a two-tier system that analysts said could undo the ambitious global reach. "

Basel 2 compliance is huge in the banking world, and many of the largest banks are scrambling to comply. Compliance all revolves around a data warehouse with all of a bank's lending exposure. Banks around the world are being pressured to comply. IBM has a good white paper on the 3 pillars of Basel II, and the metrics involved (and how their solutions can track them, of course). In the right hands this DW can provide much, much more value.

ASP.Net Enterprise Manager

This is cool.. an open source, web-based, ASP.Net Enterprise Manager clone for SQL Server. Been out there awhile, I just never ran across it until now.

My personal favorite GUI for a database is still phpMyAdmin.

Tuesday, May 18, 2004

Data Migration Strategies, Part 1 | DM Review | Industry Led, Industry Read

Data Migration Strategies, Part 1 just came out in the DMReview Email newsletter. It has a detailed, highly coordinated and quantifiable approach as a guideline for data migration projects.

XBRL Commentary on Inifoworld

On Infoworld Jon Udell has a column from a few weeks back on XBRL, and how complex it is:

Uh-oh. I thought BPEL4WS(Business Process Execution Language for Web Services) was a brain exploder, but it's a walk in the park compared to this stuff. The XBRL spec describes how the parts of an XBRL instance interrelate, using state-of-the-art XML technologies such as XLink and XPointer. And it talks at length about the syntax and semantics of “taxonomies” that abstractly define chunks of financial reports. No sign of any actual financial data, though. And the link to a sample page at xbrl.org, returned a “404 Not Found.” I’m not surprised. The poor bloke whose job it was to produce that sample must have suffered a polymorphic recursive brain meltdown.

I've been thinking a bit about XBRL and it's applications. Essentially, XBRL is like an RSS feed that companies could make available, but sepcifically designed for financial reporting. If you have ever read an annual report (here's a link to one from Regions Bank) then you know these are chock full of metrics and numbers, and usually go into great detail about particular lines of business. For example, that annual report has breakdowns of loans by type, detail on loan losses by type by year, and details on recovered funds. The rest of the report is just as detailed, and the quarterly reports break down into quarters.

Capturing all that data makes analyst's jobs much, much easier as they can quickly get a glimpse of growth and loss in any number of areas. But it could yield some great intelligence for companies too, especially when analyzing competitors, market share changes, and performance against economic conditions. Trouble is, it is a pain to get it in a database to perform such analysis. XBRL is supposed to be the answer to that.

NASDAQ supports the standard and feeds are currently available for free for a number of companies (AIG, AMGN, BAC, C, CSCO, DELL, GE, HD, HPQ, IBM, INTC, JNJ, JPM, KO, MO, MRK, MSFT, PEP, PFE, PG, SBC, TWX, VZ, WFC, WMT, and XOM). Microsoft also provides a tool for XBRL Analysis.

Monday, May 17, 2004

Free copy of VB.NET

Visual Basic: Visual Basic At The Movies has started a new promotion:

Let us know what you think! View and rate five movies, then sign up to receive your redemption code email entitling you to a complimentary Not-For-Resale copy of Visual Basic .NET 2003 Standard Edition

The movies have received some good reviews in the .NET community and are essentially basic tutorials with a movie-theme twist (one is an indiana-jones-type thing for example). The VB.NET you receive should allow you to author Reporting Services reports on SQL Server.

Real-Time Data Warehousing: GIS

Mark Rittman links to DM Review - Real-Time Data Warehousing: GIS by Simon Terr, who learns a great lesson:

Another interesting discovery was that upon closer inspection data warehouses have geographic information. In fact, virtually every dimension and fact table has some geographic component to it from customer addresses in the customer dimension to transactions and their locations in the fact table.

I have always thought that a geographic dimension based on zip code is one of the great untapped stores of knowledge in any data warehouse. A dimension based on geographic and demographic data (much of it available for free) is extremely powerful in the hands of a marketer, and is easy to add. Think that mapping systems are only for research facilities, the military, and huge corporations with millions to burn? Think again. Microsoft's Mappoint (a little over $200 delivered) has a free OLAP plug in that makes maps like this a cinch:



This was built with Mappoint 2002 off an otherwise very normal sample star schema. Lattiude and logitude data associated with each zip code (in the census files linked above) allow you to do cool proximity and closest-location searches. Add in a market survey/study for your industry (probably the most expensive thing mentioned here) and all of a sudden any data warehouse is exponentially more powerful.

I've mentioned this before, but until way more people catch on, i'll mention it again I'm sure.

Friday, May 14, 2004

Some MySQL Performance tuning links

mySQL Performance Tuning is a good overview of which settings to tune for better performance. A complete list of tuning parameters is here.

MyTop is a cool cool tool authored by Yahoo! Finance's Jeremy Zawodny that shows current activity on a mysql server in a familiar "top" format, perfect for an ssh session.

Zawodny has explained Yahoo's use of MySQL and Perl or PHP many times, including this time.

This presentation outlines the growing pains of LiveJournal.com, and how the architecture had to continually be adjusted to account for growth. Very interesting and easy to follow, chock full of good lessons applicable to any rapidly growing website.

Thursday, May 13, 2004

Oracle To Update App Server With Business Activity Monitoring

Looks like Oracle wants to enter the BAM space:

CRN : Daily Archives : Oracle To Update App Server With Business Activity Monitoring : 2:53 PM EST Thurs., May 13, 2004: "The BAM data feeds into a central management console, where it can be correlated, filtered and subjected to analytics, according to Oracle. The results can be viewed on a portal dashboard. "

According to the article, the BAM capabilities include support for BPEL, which is coming out the winner in the standards wars for the "Process Execution" slice of the alphabet soup of the SOA. Oracle has not settled on a standard itself however, announcing it will support both BPEL (initially proposed by MS and IBM) and WS-Choreagraphy. Thank goodness for both of them, for as the article says, "Oracle's move brings clarity to the respective focus of each group."

NXTract - Oracle to any data extract Utility.

NXTract is a useful application for your recovery/forensic toolbox:

NXTract is a conversion and data rescue utility. It reads Oracle EXP dump files and converts the internal data to a standard tab delimited format. This allows you to read Oracle tables with any program, database, or utility that accepts tab delimited data such as Sybase, SQL Server, DB2, Excel, Access, Ingres, MySQL, Informix, Lotus 123, dBASE, Visual Basic, Foxpro Powerbuilder, and other versions of Oracle, etc.

Microsoft 5 year Server Roadmap

Ars Technica has a summary of the Microsoft server roadmap for the next 5 years.

2006 Longhorn desktop release
2007 Longhorn Server
2008 Longhorn Server Service Pack
2009 Update to Longhorn Server
2010 Service Pack for Update to Longhorn Server
2011 The Next Major Server Release

So You Want to Become an Oracle DBA? Part 2 - Learning Oracle on your own

So You Want to Become an Oracle DBA? Part 2 - Learning Oracle on your own is a floow up to Part 1. Addresses some touchy subjects, including how there are hardly "gateway" jobs to becoming an Oracle DBA.

Wednesday, May 12, 2004

BI features in Yukon

Just ran across this Power Point presentation by Vidar Burud of BI features in Yukon. Some good slides on the "Unified Dimension Model" which includes alot of automation to provide some real time capabilities to MOLAP cubes (according to the slides at least). Lots of nice screenshots of the BI workbench also.

It looks from this like a whole lot of thought has gone into this part of Yukon to simplify and automate creation of dimensions and cubes and I can't wait to see it in action myself.

Vidar also authored a nice Reporting Services Overview as well.

Oracle Background Processes

Oracle Background Processes: "Given below is a list of twenty background processes along with their technical or internally referred names." Nice overview to bookmark.

TIBCO, Gartner and Teradata Host Executive Webinar

TIBCO, Gartner and Teradata Host Executive Webinar on developing an effective BAM strategy on May 20.

Hacking Excel to make a speedometer

O'Reilly has a sample from Hacking Excel that describes WindowsDevCenter.com: how to make a working speedometer using a combination of a donut and pie charts. The needle can even move. Dashboard hackers, knock yourself out... Combined with my previous post with .NET pie chart and bar chart sample code, this is a fairly complete little package.

Some RFID commentary

Sadagopan's weblog on Emerging Thoughts, Technologies, Ideas, Trends has a link to a Harverd Business Schol article on how RFID has much more potential to improve processes than is being looked at right now.

The key success factor is to focus early on analytical gains, and only later on production gains. Unfortunately, all too many companies are losing critical opportunities because they are going about it backwards.

Internetnews predicts the worldwide market for "machine-to-machine" communications, driven by RFID, will grow to $31 billion by 2008.

Tuesday, May 11, 2004

A Guide to High Availability MySQL Clustering

A Guide to High Availability Clustering is a white paper from MySQL AB (free registration, it gets emailed to you). One interesting item in it claims that on average, companies using MySQL experience per-project savings of over $250,000.

To anyone interested in data warehousing, I would highly advise reading this and keeping an eye on MySQL and where it may fit in to an environment (calling it a "desktop database" is a bit of a mark of ignorance imo, or a blatant attempt to snow over the less-technically knowledgeable decision-makers). MySQL is the only database that can truly operate in a white box, commodity hardware, yet clustered environment, and does so regularly in production. Just ask Google or Yahoo. Savings on licenses alone on a reporting server would be well into six-figures vs. Oracle, Sybase, DB2, or SQL Server. This is a direction and trend for MySQL that isn't stopping, and ignoring it isn't much different than putting one's head in the sand.

Merrill Lynch's IT Data Warehouse

Yet another good article in the latest issue of DB2 mag today, Unleashing the Power of Data discusses Merrill Lynch's creation of an "IT data warehouse" specifically created to track IT assets. They use it to plan server consolidations, anti-virus planning, discovering underused software, vendor negotiations, disaster recovery planning, and much more. Sounds like a great idea. This part sounds pretty cool.

Data visualization (heatmaps). Data visualization techniques provide higher-level analysis. For example, heatmaps, popularized by SmartMoney.com's Map of the Market (www.smartmoney.com/maps), highlight Merrill Lynch's virus readiness by site, organization, and number of servers for upper management. The heatmap interface displays multiple dimensions in a two-dimensional format by using shades of colors � red (bad) to green (good) � and shapes (whose size indicates the number of servers in a cell) to highlight critical information. Although this interface is traditionally used to display financial information (changes in stock prices), we adapted it to technology data and found it an excellent vehicle for focusing attention on the critical information clients need to make decisions.

I wonder what these heatmaps look like in this context.

'Stinger' DB2 includes .NET CLR

IBM is hyping up its next release of DB2 UDB, codenamed 'Stinger', and posted an open beta of it just last week. Among other features, it includes full support for .NET CLR.

Support for CLR routines is presently conditioned by a number of factors. CLR SPs and UDFs can only be deployed on servers with operating systems on which the CLR can run. Today, support is limited to 32-bit Microsoft Windows operating systems, although support for .Net on 64-bit Windows is expected. Another requirement: .Net version 1.1 must be deployed on the server.


The CLR will be available in all versions, including for Linux, Unix, and Windows. To quote the article: "DB2's CLR support actually gets a jump on Microsoft's flagship database product, SQL Server, which won't introduce CLR support until the upcoming release of its latest version, now expected to be sometime in the first half of 2005."

Monday, May 10, 2004

Microsoft Executive Circle Webcast: Enterprise Dimension Management: Effectively Manage Your Multi-Dimensional Applications

This may be an intersting webcast.

Date: Thursday, May 20, 2004
Time: 10:30 AM-12:00 PM Pacific

Description: After years of building analytical applications, many companies are discovering that they face a dimension management crisis. Maintaining and synchronizing business dimensions across financial, sales and marketing, and operational applications consumes increasingly more resources as each discrete application is deployed. Stratature's technology-neutral dimension management solution, +EDM, ensures that all systems across your enterprise share identical versions of business dimensions and hierarchies (stores, products, customers, accounts, etc.). It also effectively enables integration across a wide spectrum of analytic applications and technologies, including Microsoft® Analysis Services. Attend this webcast for a firsthand look at the way +EDM delivers reliable dimensional accuracy and eliminates redundant maintenance and synchronization effort.

I met Ian Ahern, the CEO from Stratature once, and he is a very knowledgeable guy dedicated to running a long term company. I'm sure they will have some good advice during this webcast.

First RFID database debuts

First RFID database debuts is a press release from RFideaworls, which has created a RFID data store middleware layer it sounds like to me. There's hardly any technical info on their website. Sounds a bit like a middleware layer built on Oracle. Their RFIdb product:

RFIdb provides rapid RFID-enablement and integrated information for Oracle, regardless of RFID standards or requirements.

Not much more meaty information.

Wednesday, May 05, 2004

Article On Top Ten New 10g Data Warehousing Features

Mark Rittman has written about the Top Ten New Pracle 10g Data Warehousing Features.

Monday, May 03, 2004

SQL Server 2000 High Availability Series: Remote Mirroring and Stretch Clustering

SQL Server 2000 High Availability Series: Remote Mirroring and Stretch Clustering is a hot topic that popped up more than once in conversations I had last week. It involves having a real-ime SQL cluster strectched out over long geographic distances. It's fairly complicated, requires third-party support and identical SANs capable of synchronous mirroring. For large companies, this is an option they will want to explore for their Disaster Recovery plans as SQL Server keeps moving up the chain of critical systems.

Microsoft Bolsters Reporting Services with Buy

Microsoft Bolsters Reporting Services with Buy: "Microsoft (Quote, Chart) Monday moved to build out its business intelligence suite by acquiring privately-held ActiveViews for an undisclosed sum.
The Redmond, Wash. company has been seeking for ways to improve its SQL Server 2000 Reporting Services software platform, a key piece of the company's SQL Server used by corporate employees to gauge the operations of their businesses."

ActiveViews has 5 employees, and their product looks like it would integrate well with Analysis Services. On the tour I do not see any graphics capabilities however. Shouldn't there be? After using Databeacon I can't imagine using a fron tend that doesn't allow you to switch to a piechart or line chart view with a single click.

Generate SQL Insert statements for your SQLServer 2000 Database

This is pretty convenient... Sanjay's Coding Tips :: Generate SQL Insert statements for your SQLServer 2000 Database links to a free utility to completely script out a SQL Server database, including all the INSERT statements necessary to populate it. I cut my teeth on MySQL years ago, and the mysqldump utility, which we used for backups, did exactly this. It was very useful to take that dump file and move it anywhere and restore the database, including all tables, indexes, and data. With a little Perl to filter out some of the object compatability issues, the same file could be read in to Oracle or SQL Server.

I always wondered why Oracle and SQL Server never had this option built in.

Sunday, May 02, 2004

BizTalk Server may be Microsoft's answer to CPM

InfoWorld: BizTalk Server brings everybody into the process is an extremely positive review of Biztalk server 2004. Biztalk has always been an integration platform, sort of a hub to connect to disparate sources and introduce transactions and queieng capabilities between those sources. According to this review, much has been built on to that core, and much of it feels very right, introducing user-friendly workflow GUIs and easy integration with Sharepoint.

Like many of the newest MS products, development requires VS.NET. Tight integration with Infopath is also mentioned, which requires Office 2003. Even so, Biztalk can fill a big hole realtively inexpensively for a lot of companies (price maxes out at $25k/CPU, considerably cheaper than other products). This is worth a seminar if there's one near you.

Saturday, May 01, 2004

BPM Case Study for Cooper Tire

BPMInstitute.org has a number of presentations in Real Media and PDF formats from a Brainstorm BPM conference in November. It gives a pretty good snapshot of how companies are using BPM today. (Free registration may be required)

For an over view of BPM, check out this from Doculabs, or "Moving Beyond the Buzz - Understanding and Effectively Utilizing Business Process Management" (more free reg).