Unstructured Realities; A Conversation
It was shortly after I missed his sessions at a recent conference that I met and got to know Alan Pelz-Sharpe of CMS Watch, a nine-person analyst firm neutrally covering the unstructured data space. Alan had been presenting at TDWIs show in Las Vegas along with his partner and principal at the same firm, Tony Byrne. They were on a road show that would include AIIM and DAMA as well. Since unstructured content is a reality or on the horizon for many BI professionals used to the structured data world, I thought it would be a good time to ping Alan on where things are meeting up, and what in the world he was doing at a conference mostly dedicated to data warehousing.
Jim Ericson: So Alan, what in the world were you doing at a conference mostly dedicated to data warehousing and how did it go?
Alan Pelz-Sharpe: It was very interesting, we didnt get a huge turnout at our sessions but I dont think anyone expected that. What we did get was an incredibly engaged audience and Tony and I came back saying, its not about the crossover of the structured and unstructured world, its about the reality of the people who are working in these sectors and how those worlds touch. What Im trying to say in a roundabout way is, I think what weve been focused on in the past was the technology overlap and maybe theres less of that than we thought. But in the day-to-day workings of a database administrator or IT manager, they dont make those distinctions between structured and unstructured, they get asked to look at this project and that project and somehow pull them together. At a practical business level I think theres an awful lot to talk about.
JE: We reporters and analysts tend to get ahead of the reality curve. BI Review held a conference last year with a session on unstructured data and I had a tough time even lining up speakers. But there was one fellow from a law firm who really captivated the audience who just assumed some of this stuff couldnt be done.
APS: Yes, I think in the past analysts have gotten a bit too excited and run off on the path saying, its the death of BI, search is going to take over, or its the death of search, BI is going to take over. Thats not reality, but there is reality in what businesses are actually trying to do out there.
JE: So what about unstructured and structured being different animals?
APS: It isnt that there isnt any overlap, some search products actually do touch on the world of BI and Im sure that will happen more and more. The end user experience will probably become more merged and transparent. To them it will just be a search or query if you want to put it that way. But at the back end they are very different animals. Where things are changing a little bit is that search engines, I think in fairness, theyve gone through a bit of a revolution. Early search engines really were just [about] key words, and if it hits on a word, theres your result. Theyve morphed into really complex analytic engines that can be tuned constantly. Thats different, the early search engines either got your result or didnt. That [new] element has come really from the business intelligence world and there are crossover areas like Web analytics. Thats a new-ish topic at CMSWatch, weve only been covering that for about a year now but I found it fascinating that the audience was so engaged on the topic in Vegas. I think what was so fascinating was Tony constantly stopping and trying to explain himself. Just when people in the audience thought they got it, they suddenly didnt. Theres a lot of commonality but again
JE: Im really interested in Web analytics given my own media challenge and also the general corporate push online. Web analytics and BI analytics really are different animals in terms of scale, scope, parameters etc.
APS: Yeah, exactly. Different parameters and our conference session veered a bit into a very constructive corner where we were really explaining how a Web page is actually structured and the implications of delivering java scripts and frankly how poor the data thats coming back often is. Just some basics like page views versus the actual number of people visiting your site. I didnt want to be rude to the audience but clearly none of them had touched on this before.
JE: I wrote a pretty simple case study on Fox Sports using Web analytics and got more mail on that than on any story recently. A fellow at Yahoo wrote me to say, 'cool story, yeah, thats what were trying to do here.' That was surprising coming from a company like that.
APS: But thats the truth of it, even for a business like Yahoo, you can replicate that experience anywhere. If youre in telco you know data warehousing inside out. Youve been tracking usage and call data for years and youre into the fine details of mathematics really. When you get to the Web its the Wild West.
JE: Just going back to the general unstructured data non-Web stuff, its a frontier for us at DM Review as well. Its something were trying to address constructively. The data warehousing industry has been stop and go in this regard. I talked to Bill Inmon a couple of years ago who was trying to stuff unstructured content into the data warehouse, accommodate it in the old model and I just dont think its the same animal, as youre saying.
APS: Its not the same, and that was the focus of our two workshops on Web analytics and on simple content technologies. I was explaining simple document management last week, really if you think of your old-fashioned filing clerk and the cabinets and drawers of files
JE: I still have a stapler on my desk that is a crucial tool in my filing system. Its not going anywhere until Im sure I have something better.
APS: I am paper-centric myself. Thats the reality of the software, it mimics paper filing, which is actually a little more difficult than people give it credit for. When you get into engineering or health care, your number schemas and filing schemas get quite specialized. Its not a database.
JE: Right, there are also digital assets, warranty, rights and contract assets, all sorts of stuff. Before I came to DM Review I covered the waterfront, including unstructured content, and was watching vendors like Interwoven, Vignette, Filenet or Documentum where the focus was on workflow. Has that changed?
APS: Its changed but remains the same as well. That landscape still exists and it will never go away. The day lawyers stop dealing with complex cases that deal with hundreds of emails and documents will never arrive. Theres no reason for that to go away because it works pretty well. The other traditional landscape is the imaging market. High speed scanning of checks or insurance or medical claims forms is staggeringly boring but hugely profitable. Its also quite difficult to do, so that landscape still exists but is a bit harder to do these days in the sense that when you scan something these days you dont just create a TIFF image, you actually have software intelligent enough to break down and read the page and extract data for various bits of workflows, so its sort of moved on a bit. But the imaging and the document management worlds are still there.
JE: The content management space back then aligned itself with the early enterprise portal companies to pursue collaboration and knowledge management as well as workflow.
APS: Thats coming back, but a little differently now. I hate to use the word collaboration, but look at Microsoft SharePoint for example. I think 80 million seats of SharePoint have been shipped, its hugely successful. Its just basic sharing of office documents and has become a huge market in its own right.
JE: You just carve out a workspace, gather a few folks and have at it, maybe with some analytics to boot.
APS: Right, you build a mini-portal and you and your team work there, its easy to use where less than 10 years ago that was a hugely expensive thing to do.
JE: Email has to be getting huge in the unstructured space.
APS: Yes, thats the other world thats coming up, and this actually touches on the structured data world more than people realize in email archiving and email management. That is just about to explode as a market and there really isnt a large company in the world that isnt looking at that. If you think about what an email archive is, well, a small one might be 40 terabytes, a large one might be three or four petabytes, just one huge database at the end of the day and its sort of falling into the enterprise content management [ECM] technology world but it probably shouldnt. Its really more of a back end data center task. Its sort of hovering between the two.
JE: And people seem more concerned about compliance than being constructive with email.
APS: Thats right, so it splits into those two things, email management, which is the compliance thing and hence ECM vendors are all trying to get involved. But then youve got the simple archiving, I cant operate with 20 terabytes of old emails on my exchange server. It doesnt work but I cant get rid of it so Ill back it up, but that doesnt work, Ive got the compliance thing so Im actually going to have to archive it and search it in an intelligent way with e-discovery tools at a later date. Thats an emerging market but its emerging at a heck of a pace.
(end of interview)
I was just about ready to jump into Web versus desktop applications when I realized Id gone too far to stay fair to our readers. Im going to carve out some space at DMReview.com to move down this road, and if youve read this column to the end, youre someone who might be interested. If you are, drop me a line at jim.ericson@sourcemedia.com.
Jim Ericson is editorial director of DM Review, a SourceMedia publication. You can reach him at Jim.Ericson@sourcemedia.com.
For more information on related topics, visit the following channels:


