Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events
Archived Events

White Paper Library

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

When the Going Gets Tough, the Tough Think Globally

  Article published in DM Review Magazine
September 2003 Issue
  By Michael Keilen

As the world economy begins its climb out of recession, IT executives are beginning to look past budget cuts to what the future may hold for their organizations. In addition, organizations are focusing on deriving more value from the huge investments they've already made in customer relationship management (CRM), supply chain management (SCM), business intelligence and data warehousing solutions to further help the company's bottom line.

This increased drive for business growth and profitability has created a shift in the way organizations think about doing business on a global basis and how their IT investments can best support those efforts. No longer is the term "multinational" reserved for the largest of the large corporations. As more organizations extend their reach around the globe, however, they quickly realize that they have an increased need to integrate customer, supplier and operational data across their corporate computing systems -- regardless of what language that data exists in.

This new challenge of global data integration (GDI) is caused by a number of business and economic forces, including:

  • A desire for greater customer intimacy. Companies that conduct business on a global scale are requiring that their enterprise computing applications support their global strategies as well. Organizations want to better understand the multitude of relationships their subsidiaries around the globe have established with their customers' various operating units. The ultimate goal: achieving a single view of each customer and enabling companies to fully leverage relationships through up-selling and customer retention strategies.
  • Expansion into recently developed and newly developed markets. The rising spending power of consumers in developing countries, combined with the growth in global e- commerce, has led companies to expand into new markets. This results in globally diverse organizations that benefit from significantly improved return on investment by spreading product and service R&D; costs over a greater number of customers.
  • Cost-saving measures. With an eye toward reducing the costs of storing and processing data, more global companies are consolidating their regional data processing centers in an effort to streamline their IT investments and better manage mission-critical data sets.

Yet the task of GDI raises significant IT issues, including ensuring a reasonable level of information quality and managing information globally for decision support, CRM, business intelligence, data warehousing, enterprise resource planning (ERP) and other activities that are built from data gathered and utilized locally.

That's especially true given the complexity of accurately interpreting data stored in multiple languages and writing systems across multiple platforms and systems. Fortunately, a standard called Unicode has been created. Unicode addresses the issue of integrating global data sets to accurately interpret data across platforms and systems. With Unicode as a foundation, companies can now develop a GDI strategy to address the many issues inherent in converging data from a multitude of sources where the data is represented by different language and writing system combinations -- one that takes into account the various nuances that need to be proactively considered when data sets are converged.

Utilizing Unicode

Unicode defines a universally accepted standard for encoding the characters of virtually every writing system. Originally, "character sets" or "encodings" were created to limit the number of characters a computer system had to consider. Some modern day examples of these encodings include ASCII (the American Standard Code for Information Interchange), Shift-JIS (Japanese Industrial Standard - the most common character set in the Far East) and VISCII (the Vietnamese Standard Code for Information Interchange).

Early on, it would have been impossible for computers to handle the tens of thousands of characters required to represent the majority of the world's writing systems; therefore, character encodings were developed for a specific purpose, often divided along dimensions such as spoken language, geographic region of use and computer architecture.

Enter Unicode, a universal index of characters. It provides a unique number for every character, regardless of computer platform, software program or language. In essence, Unicode makes it possible for organizations to combine and share global data. Unicode allows data to be transported through many different computer systems without corruption. Additionally, it allows a single software product or Web site to be used across multiple platforms, languages and countries. By incorporating Unicode into client/server or multitiered applications, Web sites and other areas of an IT infrastructure, companies can obtain significant cost savings and combine global data sets to better market and sell products and services.

Unicode is best viewed as an enabler. While it doesn't in itself perform the functions of integrating global data sets into an organization's corporate computing environments, it makes it easier to perform these functions. For example, Unicode does not translate language, nor does it localize interfaces or automatically convert cultural or country-specific values such as data formats or currency -- but it can make these tasks easier.

Unicode is sometimes confused with "double-byte," a term that refers to double-byte character sets used for encoding some Asian writing systems such as Kanji. These writing systems required character sets that needed two bytes of information to build a comprehensive index. In fact, "double-byte" has increasingly become more of a generic term for these types of encoding systems as well as for Unicode, which is actually a multibyte character indexing system.

Data Quality Considerations in GDI

Because Unicode support does not infer correct interpretation of a specific writing system, it shouldn't be viewed as a panacea for all data quality interpretation issues companies face. Specialized software that recognizes and provides for the conventions of different languages and writing systems used by countries and cultures must be created. These conventions can be very complex and require considerable expertise to ensure that desired results are achieved.

In order to build and maintain the relationships between the data stored across an entire organization, a strategy to ensure the quality of the data needs to be as carefully considered as how the data will be exchanged and stored in the systems. Some things that strategy must address include:

  • What processes and procedures does the organization have in place to minimize data errors stemming from "bad hand- offs"? Some common examples include recording a mispronounced word, misspellings, typos or keying errors and incorrect data mappings or transformations as data moves through systems.
  • Is the organization prepared with contingencies for "context"? Customer-centric data errors and their correction are context specific. This means that what is considered an error and how to correct it is often determined by the environment in or purpose for which it was created.
  • How will a particular writing system and cultural context affect the entry of data into an organization's information systems? For example, a Japanese consumer placing an order into a U.S.-centric Web site would be entering that information in a "Romanized- Japanese" writing system -- i.e., Japanese information recorded in a Latin writing system. At the same time, an order placed by a Japanese consumer through a Japan-based call center would have that information entered in the cultural and language context of the Japanese version of the Kanji/Kana writing systems.
  • How can an organization ensure accurate interpretations between data represented by a phonetic writing system versus an ideographic (picture- or image-based) system? Many words or commonly used phrases found in phonetic-based writing systems such as Latin don't have an exact match to characters in an ideographic writing system. Thus, the context in which a word or ideographic symbol exists must be clearly understood and accounted for to ensure accurate interpretation.

It is important to note that these examples only cover a fraction of the issues and decisions that an organization must consider before finalizing a GDI strategy and its required quality components. Given these types of issues, it's easy to see that there is no one solution -- no single piece of technology, no single language/writing system combination -- that will fit every organizational need. Each GDI strategy will be unique and a changing entity as an organization grows.

While considering Unicode as a standard backbone of an organization's global computing systems, it is critical to address the very context-specific nature of data quality. A solid data quality strategy ensures that technologies and processes are deployed at the points where the context of information is still clear -- from the front office where data elements are collected to back-office systems where those elements drive business decisions. In the case of global data, we must make certain that cultural and regional writing system nuances can be accounted for and understood. Otherwise, as data moves deeper through an organization's systems, increasingly less context exists to determine its original meaning.

Setting a Strategy for GDI

Given the hundreds of languages and writing systems in existence, organizations need to evaluate what country/language/writing system combinations they need to support now and in the future to achieve their critical business goals. Then, a comprehensive GDI strategy can be developed allowing for the addition of plug-and-play encoding modules that accurately interpret data from the standpoint of cultural and regional writing system nuances.

At the same time, it is important to keep in mind that many regions of the world may require an organization to support multiple language/writing system combinations. For example, complete "practical support" of Japan requires support of the Latin writing system for data collected in Western Europe or the Americas, as well as data collected within Japan that could be Kanji, Kana or Latin, or a combination of these systems.

A person new to GDI may jump to the conclusion that they have to convert every system to run solely on Unicode data, in effect creating a Y2K scenario. That is not the case. Characters from other character encodings can be mapped to Unicode and back to the legacy character sets as needed. In fact, the ASCII character set is contained without change to each character's relative order within the Unicode structure. That makes it extremely easy to move ASCII characters in and out of a Unicode character set. While that's not true of all character sets, Unicode conversion libraries come standard with mapping functions.

The application of these functions allows organizations to take a stepwise approach, converting some systems and leaving local legacy systems untouched. The key is to understand where and when to map to a different character mapping. Using conversion mappings and Unicode as a foundation allows an organization to create an informational infrastructure where it can think globally and act locally or, more specifically, integrate data globally and process it locally.

Quality Relationships Require a Continuous Process

It is understandable that organizations have difficulty appreciating all the aspects that go into building a global data integration strategy. With the complexities of understanding Unicode itself overshadowing the need to determine the different combinations of languages and writing systems affecting an organization's corporate computing environments, building a strategy needs to be a continuous process.

Ultimately, the goal is simple: An organization wants to build and maintain relationships with its customers. The data it collects in every aspect of its business can play an integral role in the success of that organization. Those that understand that critical link between how data is integrated across the globe and ensuring that it is of sufficient quality are the organizations that, in the end, will be the most successful.


For more information on related topics visit the following related portals...
Data Quality and Data Integration.

Michael Keilen is the director of solutions management at Firstlogic, Inc. He has more than 15 years of experience in software development and design, management and marketing. For specific information on information quality issues related to global data integration and Unicode, please contact him directly at (608)782- 5000, ext. 2244 or mikek@firstlogic.com.

Solutions Marketplace
Provided by IndustryBrains

Dedicated Server Hosting: High Speed, Low Cost
Outsource your web site and application hosting to ServePath, the largest dedicated server specialist on the West Coast. Enjoy better reliability and performance with our screaming-fast network and 99.999% uptime guarantee. Custom built in 24 hours.

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Quality Tools, Affordable and Accurate
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

EMC EmailXtender®- Archive, Automate & Lower Costs
Learn more about email management from the leader in storage software: Download your free copy of the whitepaper "Email Archiving and Information Lifecycle Management"

Save on Business Intelligence and Data Warehousing
Leverage Open Source database software and PC-based commodity hardware for an unsurpassed price/performance value. ExtenDB transforms the economics in developing a Business Intelligence infrastructure.

Click here to advertise in this space

View Full Issue View Full Magazine Issue
E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2005 DM Review and SourceMedia, Inc. All rights reserved.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.