Wednesday, March 21, 2012

Optimization gurus: Help with varchar vs. text fields decision

Hi, I'm trying to improve performance for a db that stores messages. The message is currently stored in an ntext field, but when I look at all the records from the past 3 months, I found that 88% are less than 1000 characters, and 97% are less than 3000 characters.

I don't want to actually limit the message size, but it seems like I might get much better performance using a varchar(3000) field to hold most of the messages, and a separate text field just used for those 3% that really are long. Is this a good idea? If so, is it better to put the Message and LongMessage fields in the same table; or, have a separate table to hold the long messages? If it is in a separate table, it would need to be left joined with the message table each time messages are retrieved.

Also -- I am getting about 700 new messages daily, and right now have over 150,000 messages stored. The vast majority of activity involves new messages. Is this a good situation to look at using horizontal partitioning?

Thanks for any help, I don't really have anyone to discuss this with and it is really helpful to get some other views!!

Are you able to upgrade to SQL2005? VARCHAR(max) would be a simple solution to your problem.

|||

Yay, I'm already on SQL server 2005, so I could use varchar(max) -- now that I've heard of it! Are there performance issues to be aware of with max? Any drawback to a design where the row size will vary wildly from row to row??

|||

Celestine:

when I look at all the records from the past 3 months, I found that 88% are less than 1000 characters, and 97% are less than 3000 characters.

I calculated wrong -- it is an ntext field, so each char is two bytes, not one byte. Meaning 97% of the messages are actually less than 1500 characters, not 3000. All the messages are in English, so I'm not going to continue using ntext or nvarchar.

|||

Varchar(max) allows rows to span physical blocks, hence no row length restriction. Why not look it up on Books-On_line BOL?

|||

I understand that there is no length restriction, I looked it up right away; thanks for making me aware of the max option. What I am asking about is whether there are performance implications to consider with using varchar(max), when you are hoping to get multiple records to fit on a data page.


|||

With varchar(max) most of the records will be fetched with a single read whereas for ntext, two reads will be required for every record irrespective of its size.

No comments:

Post a Comment