Microsoft BI Conference 2008 October 6-8 – Seattle, WA

Registration is now open for the 2008 BI Conference here in Seattle. I was able to attend last year and to say it was worth it would be an understatement. Definitely attend if the opportunity presents itself.
 
With SQL PASS also taking place in Seattle, it’s quite a busy few months in the SQL Server world here.
 
Register by August 8th you’ll be able to get it at the $995 price, after the 8th it goes up to $1295.
 
 
If anyone needs tips on where to stay in the Seattle area (I live a seven minute walk from the convention center), let me know.
 
 
Posted in Uncategorized | Leave a comment

AMO – Delete All Partitions From a Database

From time to time I’ll need to recreate the dev/beta environment and I have to do that from production. I want an exact copy of production minus all the partitions that have been created with the exception of the template partitions.

 

At first I would go in and change the XMLA script manually to remove those partitions. However, if there were a lot of measuregroups (50-100), this could take upwards of an hour. Being that it was such a pain I probably didn’t keep dev and beta as up-to-date as I should have.

 

An Analysis Services Stored Procedure is perfect to handle this task. Works beautifully and what took me an hour to do before now takes 30 seconds.

 

I decided to use arrays within the procedure even though you could write the proc without them.

 

 

 

*objDatabase is a database object I’ve set in another function

 

 

  Public Sub DeleteAllPartitions()

 

      Dim oCube As Cube

      Dim oMeasureGroup As MeasureGroup

      Dim oPartition As Partition

      Dim i As Integer

      Dim j As Integer

      Dim k As Integer

      Dim AryCubes As String()

      Dim AryCubesSize As Integer

      Dim AryMeasureGroups As String()

      Dim AryMeasureGroupsSize As Integer

      Dim AryPartitions As String()

      Dim AryPartitionsSize As Integer


     
‘Create the cube array (AryCubes)

      AryCubesSize = (objDatabase.Cubes.Count – 1)

      ReDim AryCubes(AryCubesSize)

 

      ‘Loop through each of the cubes in the database and throw the cube names in an array

      For Each oCube In objDatabase.Cubes

          AryCubes(i) = oCube.Name

          i = i + 1

      Next oCube

 

      ‘Loop through the cube array

      For i = 0 To AryCubesSize

 

          ‘Set the cube object to the current item in AryCubes

          oCube = objDatabase.Cubes.GetByName(AryCubes(i))

 

          ‘Create the measuregroup array (AryMeasureGroups)

          AryMeasureGroupsSize = (oCube.MeasureGroups.Count – 1)

          ReDim AryMeasureGroups(AryMeasureGroupsSize)

 

          ‘Set j back to zero for the current item

          j = 0

 

          ‘Loop through each of the measure groups in the cube and throw the measure group names in an array

          For Each oMeasureGroup In oCube.MeasureGroups

              AryMeasureGroups(j) = oMeasureGroup.Name

              j = j + 1

          Next oMeasureGroup

 

          ‘Loop through the measure group array

          For j = 0 To AryMeasureGroupsSize

 

              ‘Set the measure group object to the current item in AryMeasureGroups

              oMeasureGroup = oCube.MeasureGroups.GetByName(AryMeasureGroups(j))

 

              ‘If the measuregroup is linked then don’t delete partitions

              If oMeasureGroup.IsLinked = False Then

 

                  ‘Create the partition array (AryPartitions)

                  AryPartitionsSize = (oMeasureGroup.Partitions.Count – 1)

                  ReDim AryPartitions(AryPartitionsSize)

 

                  ‘Set k back to zero for current item

                  k = 0

 

                  ‘Loop through each of the partitions in the measure group and throw the name in an array

                  For Each oPartition In oMeasureGroup.Partitions

                      AryPartitions(k) = oPartition.Name

                      k = k + 1

                  Next oPartition

 

                  ‘Loop through the partition array and drop the partition if its not the template partition

                  For k = 0 To AryPartitionsSize

                      If AryPartitions(k) like "template" Then         

                          oPartition = oMeasureGroup.Partitions.GetByName(AryPartitions(k))

                          oPartition.Drop()

                      End If

                  Next k

 

              End If

 

          Next j

 

      Next i

 

  End Sub

Posted in Uncategorized | 1 Comment

Star vs. Snowflake in OLAP Land

About six months ago I had a discussion with another guy about what my preferred data warehouse schema was: snowflake or star. Without hesitation I said snowflake. He looked at me with befuddlement and asked why. I told him that OLAP processes dimensions more efficiently against a snowflaked schema instead of a star. We had nearly a twenty minute discussion exactly why Analysis Services likes snowflakes better than stars but I failed to convince him.  He firmly believed that the star schema was superior and anything short of me taking his firstborn hostage wouldn’t change his belief in that. Star vs snowflake usually initiates that type of steadfastness.
 
To back up my believe I put together a test. I created a dimension with three levels with each level having two attributes that were outside of the "Advertiser-Ad Campaign-Banner Ad" hierarchy. A total of nine attributes in the dimension.
 
1. Advertiser (15k rows)
  • PaymentType (2 rows)
  • Status (2 rows)
2. Sales Campaign (500k rows)
  • Category(16 rows)
  • TargetCountry (225 rows)
3. Banner Ad (12 million rows)
  • Size (8 rows)
  • AdType (4 rows)
An Advertiser has Sales Campaigns and a Sales Campaign has Banner Ads with Banner Ads being the attribute key. 
 
-In star schema land this would all be put into a single table having nine columns across.
-In the snowflake world this ends up in nine different tables.  There’d be an Advertiser table with three columns: Advertiser/PaymentType/Status, a PaymentType table with a single column PaymentType, a Status table with a single column Status, ect (generically speaking).

This dimension has more levels and more attributes but I decided to pare it down for simplicity. The test was run on an Intel Xeon 2.8 with 4GB of RAM with SQL Server RDBMS and Analysis Services on the same box. I created two different dimensions: one based off of a star schema and the other off of a snowflake schema (separate DSV’s). Each dimension was processed nine times (three Process Full, six Process Update) and the times averaged.
 
The star based dimension averaged a total of 8:35 per process vs 6:42 for the snowflaked based dimension. Why the big difference?
 
Each attribute runs a SELECT DISTINCT against the source dimensional table. Take the AdType attribute under the Banner Ad level. Against a star schema this SELECT DISTINCT query would execute against a table with 12 million rows, however, against a snowflake schema it would execute on a table with only four rows.
 
For smaller dimensions this doesn’t matter much, however if you have large dimensions and update quite frequently such as we do (hourly) a snowflake schema can make a world of difference.
 
Snowflakes are harder to read and tougher for the ETL guy to write, however dimensions process much faster against them. Also, if you’re building a dimension using a wizard (shame on you!) the wizard will be able to detect natural hierarchies whereas a star schema won’t.
 
Of course this test was done in an afternoon and not under the most scrutinizing conditions so I’m curious as to what others have experienced or think about the subject. What are your thoughts?
 
 
  
OLAP: How to Index Star/Snowflake Schema Data:
http://support.microsoft.com/kb/199132
 
 
Posted in Uncategorized | 5 Comments

Analysis Services Books

Surprisingly I get a fair amount of hits for books related to Analysis Services. Being that I’ve bought nearly all of them, I thought I’d give a short opinion on each of them.
 
 
 
 
This was the first book released on Analysis Services 2005 and it’s pretty decent. It can’t be used as a reference book since it’s not deep enough; it’s more of a tutorial book.
 
If you’re looking to learn Analysis Services 2005, I’d start with this book. It has easy to follow exercises and isn’t overwhelming (for a beginner) like a few of the other books.
 
 
 
 
 
This is THE BOOK for SSAS2005.  Excellent book, and the one I use the most. It was written by long time members (before MS bought the technology) of the SSAS team over at MS and it’s obvious that’s the case.
 
There are entire chapters devoted to AMO, Memory Managment, Query Execution, and the Physical Data Model. These chapters cover subjects you won’t find on MSDN or BOL. Gets really into the internals of Analysis Services.
 
Get this book. If I could only have one book on my shelf this would be this one. I’m currently reading it cover to cover.
 
FYI: The link for the Foodmart 2005 sample are no longer valid, so get them here: http://www.e-tservice.com/downloads.html
 

 

 

I’d place this book between Applied Microsoft Analysis Services 2005 (Lachev) and Microsoft SQL Server 2005 Analysis Services (Melomed, Gorbach, Berger, Bateman). Not too deep, not too easy. A good medium.

IMO Chapter 13 about Performance Optimization is the most worthwhile section of this book.

If you’re looking for a book about MDX don’t be mislead by the title. There are only a couple chapters discussing MDX and you’ll be better off buying a MDX specific book.

4. Microsoft SQL Server 2005 Analysis Services Step by Step – Reed Jacobson, Stacia Misner, Hitachi Consulting

 

Don’t have much to say on this one since I don’t own it. During one of my walks downtown to get a cookie at Barnes and Nobles (they have the best chocolate chip cookies BTW) I did get a chance to flip through it. Eh, it’s OK. If you’re just looking to learn the real basics it works. However, if you’re doing anything outside of the scope of building a simple cube I’d stay away. Start with the Teo Lachev book (Applied Microsoft Analysis Services 2005) instead. Probably a perfect book to use to study for the MS Certification tests.

Posted in Uncategorized | Leave a comment

SQL Server 2008 Exam Changes

I had a few minutes this afternoon so I decided to watch the SQL Server 2008 Certification Live Meeting.

 

The most interesting part, at least to me, were the numbers who had passed the various certifications:

 

  • MCDBA 2000 – 149,590
  • MCTS: SQL Server 2005 (70-431) – 24,939
  • MCITP: Database Administrator (70-443, 70-444) – 4,006
  • MCITP: Database Developer (70-441, 70-442) – 1,596
  • MCTS: Business Intelligence (70-445) – 801
  • MCITP: Business Intelligence Developer (70-446) – 364
  • MCA Database (on campus in Redmond) – 10

From what I gathered, not much is going to change with the release of the 2008 certs. It will still only be two tests to receive your MCITP in BI. One thing I was hoping for were SSAS, SSRS, and SSIS independent tests. Not this time around.

70-445: SQL Server 2005, Business Intelligence Implementation and Maintenance is being replaced by 70-448: SQL Server 2008, Business Intelligence Development and Maintenance. My guess is that the exams will differ as much as the title change. If you have the time, study for 70-445 now; no need to put it off for another six months. Beta exams will rollout in June with public consumption in August. No mention of the replacement for 70-446

They also spoke about the MCA Program quite a bit. At only ten people currently having their MCA and it costing 25k a pop, it seems more of a novelty that anything.

Trika’s blog:  http://blogs.msdn.com/trika/
Gerry’s blog:  http://blogs.msdn.com/gerryo

Posted in Uncategorized | Leave a comment

OLAPQueryLog StartTime is Wrong

I recently saw a post on the newgroup about the StartTime in the OLAPQueryLog being incorrect. We have the same problem here (StartTime is eight hours ahead) and it’s something I accepted but never looked into the cause.
 
It appears the insert is using GetUTCDate, not GetDate. GetDate uses the OS time zone. I haven’t found a way to change this behavior.
 
You could add another column to store the value, however I just wrote a view on top of the table. From what I can tell this should (at least is does right now) handle daylight savings time correctly; hardcoding the hour difference may not.
 
 

SELECT

       MSOLAP_Database,

       MSOLAP_ObjectPath,

       MSOLAP_User,

       Dataset,

       (DATEADD(HOUR,(DATEDIFF(HOUR, GetUTCDate(), GetDate() )),StartTime)) as StartTime,

       Duration

FROM

       OlapQueryLog

Posted in Uncategorized | Leave a comment

Calling MDX from a SQL Server Stored Procedure

 
I work for an online media company which allows our customers to login to a website and check their numbers. For the last several years all their activity reports have been pointed at SQL Server aggregated tables. While this worked, it wasn’t the ideal solution. Reports often timed out and the database server would have a high load throughout the day, specifically early month while everyone was checking month end numbers.
 
Eventually my boss approached me about speeding up the reports and possibly pointing them at a cube. However, there could be no changes to the report files or how the data was being called (no front end changes). All the modifications had to be done in the stored procedure.
 
After initial testing, by pointing at cubes rather than aggregated tables, most of the reports would be sped up by a factor of 5-10x. A definite improvement and well worth the work involved.
 
Here are the steps I had to take in order for it to happen:
 
1. Read http://support.microsoft.com/kb/931192 and install the hotfix on your dev box or desktop. You don’t have to do this if you’ll be working directly on the server.
 
2. Follow the steps here: http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!329.entry. Double hop authentication is funky with SSAS.
 
3. Create a linked server on the same server the stored procedure resides. Use the fully qualified domain name (not optional).
Example:

/****** Object:  LinkedServer [MYCUBESERVER_SalesCubeDatabase]    Script Date: 02/12/2008 13:17:57 ******/

EXEC master.dbo.sp_addlinkedserver @server = N’MYCUBESERVER_SalesCubeDatabase’, @srvproduct=N’cubes’, @provider=N’MSOLAP’, @datasrc=N’MYCUBESSERVER.IPADDY.COM’, @location=N’MYCUBESSERVER.IPADDY.COM’, @catalog=N’Sales Cube Database’

 /* For security reasons the linked server remote logins password is changed with ######## */

EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N’MYCUBESERVER_SalesCubeDatabase’,@useself=N’False’,@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL

 

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’collation compatible’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’data access’, @optvalue=N’true’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’dist’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’pub’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’rpc’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’rpc out’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’sub’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’connect timeout’, @optvalue=N’0′

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’collation name’, @optvalue=null

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’lazy schema validation’, @optvalue=N’false’

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’query timeout’, @optvalue=N’0′

GO

EXEC master.dbo.sp_serveroption @server=N’MYCUBESERVER_SalesCubeDatabase’, @optname=N’use remote collation’, @optvalue=N’true’

 

 

4. If your database server is running SQL Server 2000 and you are pointing at a 2005 OLAP database (which was our case), you’ll need to install the 9.0 OLAP provider on that box.

 
5. Here’s a generic example of what the stored procedure looks like (removed error trapping, nocounts, ect)
 

CREATE  PROCEDURE [dbo].[MsrsWebsiteAdCampaignHitsPerDay]

  @CustomerID INT,

  @BeginDate DATETIME,

  @EndDate DATETIME

 

AS

 

–Generate the dynamic SQL

DECLARE @sqlcode VARCHAR(5000)

SELECT @sqlcode =

SELECT

       "[Date].[Date].[Day].[MEMBER_CAPTION]" AS ActivityDate

       ,’ + (CAST(@CustomerID AS VARCHAR(10))) + ‘ AS CustomerID

       ,"[Sales].[Customer to AdCampaign].[AdCampaign].[MEMBER_CAPTION]" AS AdCampaign

       ,CAST(convert(float, "[Measures].[Hits]") AS int) AS Hits

       ,CAST(convert(float, "[Measures].[Cost]") AS money) AS Cost

 

FROM Openquery(MYCUBESERVER_SalesCubeDatabase,

       SELECT NON EMPTY { [Measures].[Hits], [Measures].[Costs] } ON COLUMNS ,

       NON EMPTY { { { DESCENDANTS( [Sales].[Customer to AdCampaign].[Customer].&[‘ + (CAST(@CustomerID AS VARCHAR(10))) + ‘], [Sales].[Customer to AdCampaign].[AdCampaign] ) }  * { [Date].[Date].[Day].[‘ + @BeginDate + ‘]:[Date].[Date].[Day].[‘ + @EndDate + ‘] } } } ON ROWS 

       FROM [Sales]

”)

 

 

–Create the temp table to insert the data produced by the MDX

CREATE TABLE #AdCampaignHitsPerDay

(

       ActivityDate NTEXT,

       CustomerID INT,

       AdCampaign NTEXT,

       Hits BIGINT,

       Cost MONEY NULL

)

 

–Insert the data

INSERT #AdCampaignHitsPerDay  EXEC(@sqlcode)

 

–Select for the MSRS report

SELECT

       CAST(CAST(ActivityDate AS VARCHAR(100)) AS SMALLDATETIME) AS ActivityDate,

    CAST((CAST((MONTH((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME)))) AS VARCHAR(100)) + ‘/’ + CAST((day((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME)))) AS VARCHAR(100))) AS VARCHAR(61)) AS [Date],

       YEAR((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME))) AS [YEAR],

       MONTH((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME))) AS [MONTH],

       CAST(AdCampaign AS VARCHAR(50)) AS AdCampaign,

       SUM(Hits) AS Visitors,

       SUM(Cost) AS Cost

FROM

       #AdCampaignHitsPerDay  

WHERE

       Hits > 0     

GROUP BY

       CAST(AdCampaign AS VARCHAR(50)),

       YEAR((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME))),

       MONTH((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME))),

       (CAST(CAST(.ActivityDate AS VARCHAR(100)) AS SMALLDATETIME)),

       (CAST((CAST((MONTH((CAST((CAST(.ActivityDate AS VARCHAR(100)))AS DATETIME)))) AS VARCHAR(100)) + ‘/’ + CAST((day((CAST((CAST(ActivityDate AS VARCHAR(100)))AS DATETIME)))) AS VARCHAR(100))) AS VARCHAR(61)))

ORDER BY

       AdCampaign, ActivityDate

ORDER BY

       AdCampaign, ActivityDate

 

 

DROP TABLE #AdCampaignHitsPerDay

 

exit_procedure:

 
 
6. That’s really about it. 90% of the time was spent just trying to get the double hop authentication working.
 
 
Posted in Uncategorized | 1 Comment