SPEC OSG SPECmail2009 Benchmark
Workload Characterization for SPECmail_Ent2009 Metric

Mike Abbott, Yun-seng Chao

December 2008


 

Summary

This document summarizes the studies on mail server workload collected from multiple university and corporate sources, using a variety of IMAP4 clients. The analyzed workloads consist of both SMTP and IMAP4 requests. Each request is described by parameters which fully characterize its behavior. The proposed models, which are obtained by analyzing these parameters, are able to reproduce the behavior of the mail server workloads.

Document Organization

The report is organized as follows. We start with a description of the measurements and of the parameters considered in our studies. We then present the models characterizing the mail server workloads and we briefly describe how to use these models.

 

SPECmail2009 Additions/Changes

Much of the document discusses the workload changes between new SPECmail2009 and the original SPECmail2008 benchmark workload. Many of the internal distributions were updated with complete message and folder profiles provided by Apple, Inc in 2008. Most of this data replaces the original message and mailbox composition distributions. The SMTP traffic levels have been incorporated into the recipient and message size distributions.

One workload addition not discussed in this document is the ability to test using encrypted TCP connections. The reason lies in where this encryption incurs its cost. The e-mail clients issue commands according to user or programatic directives, regardless of the network connection's encryption mode. Empirical data shows both SUT and e-mail clients require extra computing and/or memory resources if encryption exists. Therefore, the benchmark's Secure metric influences the number of concurrent network sessions and interarrival times but not the actual command sequences. The two SPECmail2009 metrics show the effects of encrypted network connections on the SUT.

Measurements and Parameters

The measurements analyzed in our studies come from different sources. The measurements related to SMTP and IMAP4 have been provided by four companies and by two universities.? The collected sessions were divided into five IMAP4 and two SMTP groups.? The sessions within each group form the basis for all of the parameters that define the Enterprise User Profile, emulated by the SPECmail2009 benchmark.

 

IMAP Information Sources – Enterprise

Data Source

Total Number of Users

Number of IMAP Users

Data Source Type

Network Type

Mirapoint

223

223

Small company

LAN

Openwave

2500

500

Medium company

WAN

Sun

147

147

Medium workgroup

LAN

Apple

39,970

~30,000

Large corporation

LAN/WAN

University of Wollongong

Unknown

 

Medium University

LAN

Purdue University

Unknown

 

Medium University

LAN

SPECmail2009 (Enterprise Model)

42,000+ (250 Minimum)

32,000+ (250 Minimum)

Enterprise
(Small to Large)

LAN/MAN
(0% dialup)

SPECmail2008 (Enterprise Model)

250 (Minimum)

250 (Minimum)

Enterprise
(Small to Medium)

LAN/MAN (1% dialup)

SPECmail2001 (Dialup ISP Model)

10,000

10,000

Consumer

Dialup
(98% dialup)



Mailbox and Message Structures

The IMAP4 protocol allows email clients to create and maintain any number of folders and subfolders, in addition to the standard Inbox folder used in the SPECmail2001 POP3 user profile.? The IMAP4 command set also allows email clients to ask the server to describe these structures.? This information is independent of the delivery or retrieval protocols and so is treated outside of specific protocol and/or server context.

Multipurpose Internet Mail Extension (MIME) Profile

MIME is an internet attachment scheme, defined as a formal standard by RFCs 1521, 1522, and 1523.? The Sun and Apple data sets provided detailed information about mailbox and message structure.? Thus they form the basis for the following probability distribution tables used in the benchmark.?

The initial processing of all message sizes distinguished between single part sizes and multipart sizes.? The IMAP4 benchmark prioritizes individual MIME part size over the global message size distribution.

Single Part messages (Sun: 76% of total, Apple: 47% of total)

  1. Use “Content-type: text/plain” or no content-type at all in message headers
  2. Use subpart content size distribution

 

Multipart Message (Sun: 24% of total, Apple: 53% of total)

  1. Use “Content Type: multipart/mixed; boundary=”xxxxxxxxx-counter” or “Content Type: multipart/alternative; boundary=”xxxxxxxxx-counter” in message headers
  2. Use distributions for message part width and depth to help establish the set of multipart message bodies.
  3. Categorize MIME messages to fall into one of these pre-defined multipart buckets.
  4. Use subpart content size distribution to define the sub-part sizes in the fixed pool of pre-defined multipart messages.

Below are the distributions used in constructing messages in compliant with the MIME standard.

MIME Part size (bytes) vs. Probabilities Distribution

Part Size

Probability (Sun)

Probability (Apple)

Part Size

Probability (Sun)

Probability (Apple)

Part Size

Probability (Sun)

Probability (Apple)

0

N/A

0.04%

256

10.5%

2.28%

128 KB

0.7%

1.88%

1

N/A

< 0.001%

512

15.6%

6.37%

256 KB

0.4%

1.21%

2

0.6%

< 0.01%

1 KB

13.6%

9.22%

512 KB

0.3%

0.68%

4

0.1%

< 0.01%

2 KB

13.9%

18.00%

1 MB

0.2%

0.45%

8

0.4%

< 0.01%

4 KB

13.4%

28.97%

2 MB

0.1%

0.27%

16

0.8%

< 0.01%

8 KB

8.5%

11.37%

4 MB

N/A

0.19%

32

1.8%

0.05%

16 KB

4.3%

6.46%

8 MB

N/A

0.10%

64

4.1%

0.31%

32 KB

2.3%

3.91%

16 MB

N/A

0.03%

128

7.2%

5.18%

64 KB

1.2%

3.02%

32 MB

N/A

0.01%

 

 

 

 

 

 

64 MB

N/A

< 0.01%

 

MIME Distribution Chart

 


The following tables show the distribution of the number of MIME parts at the top level (without regard to nesting). It reflects the count of multipart/mixed parts immediately “attached” to the main message. It does not reflect any counting of multipart/alternative parts (i.e. text/plain and text/html, alternative formats of the same text). Nor does it reflect the MIME attachment depths (“attachments” to “attachments” or forwarded messages).

 

MIME Top-Level Part Counts Distribution

Part Count

Probability (Sun)

Probability (Apple)

Part Count

Probability (Sun)

Probability (Apple)

Part Count

Probability (Sun)

Probability (Apple)

0

N/A

46.69%

3

1.99%

2.51%

6

N/A

0.06%

1

75.76%

3.77%

4

0.24%

0.29%

7

N/A

0.07%

2

21.91%

46.20%

5

0.09%

0.26%

8+

N/A

0.15%

 

MIME Parts Chart

 

 

The next tables show the distribution of the nested MIME Part Levels that occur within a given message from the sample of MIME parts. It generally reflects messages or attachments which are forwarded multiple times, each time adding another depth level to the resulting message.

 

Distribution of MIME Part Depths

Part Depth

Probability (Sun)

Probability (Apple)

Part Depth

Probability (Sun)

Probability (Apple)

Part Depth

Probability (Sun)

Probability (Apple)

0 or 1

91.24%

90.18%

3

0.87%

0.62%

5

0.03%

0.01%

2

7.73%

9.14%

4

0.13%

0.04%

6+

N/A

< 0.01%

 

MIME Depth Chart

 

The following tables show the distribution of primary MIME Content Type (not including subtype) of all the parts in the entire sample.

 

MIME Content Type Distribution

Content type

Probability (Sun)

Probability (Apple)

Content type

Probability (Sun)

Probability (Apple)

TEXT

92.193%

86.584%

IMAGE

0.888%

5.943%

APPLICATION

4.265%

6.971%

AUDIO

0.016%

0.018%

MESSAGE

2.633%

0.465%

VIDEO

0.004%

0.019%

 

MIME Types Chart

After Sun's values were reviewed, a former employee noted that the Unix company that provided MIME distributions tended to use more text messages. Other companies have more and larger MIME parts that have richer, non-textual, content such as word processor documents, presentations, spreadsheets, web pages, calendar events, images, audio, and both rich and simple alternate MIME structures. The major effect of this shift is a tendency to increase the overall message sizes, and decreasing the Text content type in favor of the other categories.

However, increased Alternate structures does not eliminate the Text portion's counts. It just increases the other content types counters. Also, the IMAP server is not required to interpret the actual MIME parts content. It must extract the MIME part(s) and send the content, as is, to the IMAP4 client, which performs the interpretation. Therefore, the shift in Content Type distribution affects the benchmark's MIME structure of the message delivered to the SUT. The SUT still must deconstruct these MIME structures, but not the actual content.

 


Messages Per Folder

The following tables show the distribution of messages in folders at the first five levels.

Level by Level Message Probability Distributions - Mirapoint, Openwave, Sun

Top Level

Level 1

Level 2

Level 3

Level 4

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

16.4%

0

8.1%

0

6.1%

0

6.8%

0

1.0%

1

21.5%

1

31.9%

1

48.1%

1

49.5%

1

81.4%

2

3.4%

2

4.6%

2

3.2%

2

3.2%

2

1.0%

3

2.8%

3

2.9%

3

2.1%

3

3.2%

3

1.0%

4

2.1%

4

2.4%

4

2.7%

4

2.2%

5

1.0%

5

2.1%

5

2.0%

5

1.5%

5

2.0%

6

2.9%

6

1.7%

6

1.7%

6

2.3%

6

1.8%

20

4.9%

7

1.2%

7

1.6%

7

1.6%

7

1.8%

30

2.0%

8

1.5%

8

1.1%

8

1.5%

9

2.0%

40

1.0%

9

1.5%

9

1.1%

9

1.2%

10

1.1%

80

2.0%

20

7.3%

10

1.3%

20

7.8%

20

10.3%

200

2.0%

30

5.2%

20

7.8%

30

3.8%

30

4.1%

 

 

40

3.0%

30

5.3%

40

3.1%

40

3.1%

 

 

50

2.0%

40

3.8%

50

2.1%

50

1.3%

 

 

60

1.9%

50

2.6%

60

1.2%

70

1.8%

 

 

70

1.4%

60

1.8%

80

1.6%

100

1.3%

 

 

80

1.3%

70

1.6%

100

1.3%

200

2.2%

 

 

90

1.0%

80

1.3%

200

2.5%

600

1.4%

 

 

200

5.6%

90

1.1%

300

1.2%

3000

1.1%

 

 

300

3.0%

200

5.9%

500

1.2%

 

 

 

 

400

1.3%

300

1.9%

800

1.0%

 

 

 

 

500

1.0%

400

1.5%

2000

3.0%

 

 

 

 

600

1.1%

500

1.2%

 

 

 

 

 

 

1000

2.2%

700

1.6%

 

 

 

 

 

 

2000

3.1%

1000

1.2%

 

 

 

 

 

 

3000

1.5%

2000

1.5%

 

 

 

 

 

 

4000

2.3%

5000

1.2%

 

 

 

 

 

 

 

Level by Level Message Probability Distributions - Apple

Top Level

Level 1

Level 2

Level 3

Level 4

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

0.84%

0

32.83%

0

15.35%

0

10.21%

0

9.45%

1

2.10%

1

6.79%

1

8.21%

1

11.06%

1

9.64%

2

0.66%

2

3.96%

2

5.70%

2

6.40%

2

7.93%

3

0.47%

3

2.94%

3

4.31%

3

4.83%

3

6.06%

4

0.80%

4

2.31%

4

3.52%

4

4.05%

5

9.74%

5

0.87%

5

2.03%

5

2.97%

5

3.54%

6

4.02%

6

0.77%

6

1.74%

6

2.56%

6

2.94%

20

25.41%

7

0.95%

7

1.50%

7

2.22%

7

2.76%

30

6.47%

8

0.75%

8

1.35%

8

2.01%

9

4.61%

40

4.52%

9

0.6%

9

1.26%

9

1.85%

10

1.97%

80

6.90%

20

6.07%

10

1.16%

20

12.57%

20

12.82%

200

9.88%

30

4.10%

20

7.82%

30

6.28%

30

6.94%

 

 

40

3.75%

30

4.57%

40

4.07%

40

4.22%

 

 

50

3.01%

40

3.13%

50

2.97%

50

2.96%

 

 

60

2.83%

50

2.40%

60

2.26%

70

3.97%

 

 

70

2.62%

60

1.84%

80

3.44%

100

3.39%

 

 

80

2.08%

70

1.48%

100

2.37%

200

5.25%

 

 

90

2.14%

80

1.29%

200

6.10%

600

7.04%

 

 

200

14.91%

90

1.09%

300

 

3000

1.07%

 

 

300

 

200

6.54%

500

5.11%

 

 

 

 

400

 

300

 

800

2.55%

 

 

 

 

500

17.52%

400

 

2000

3.58%

 

 

 

 

600

 

500

5.18%

 

 

 

 

 

 

1000

11.03%

700

 

 

 

 

 

 

 

2000

8.22%

1000

2.77%

 

 

 

 

 

 

3000

 

2000

1.92%

 

 

 

 

 

 

4000

12.91%

5000

2.09%

 

 

 

 

 

 

Message Distribution Chart 1Message Distribution Chart 2Message Distribution Chart 3Message Distribution Chart 4Message Distribution Chart 5

Here is the same data from Apple bucketed such that each contains roughly five percentage points. These are the actual values used in the benchmark.

Level by Level Message Probability Distributions - Apple

Top Level

Level 1

Level 2

Level 3

Level 4

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

0.84%

0

32.83%

0

15.35%

0

10.21%

0

9.45%

5

4.90%

1

6.79%

1

8.21%

1

11.06%

1

9.64%

12

5.00%

3

6.89%

2

5.70%

2

6.40%

2

7.93%

22

5.07%

6

6.08%

4

7.83%

4

8.88%

3

6.06%

35

5.19%

10

5.27%

6

5.53%

6

6.48%

4

5.14%

51

5.01%

16

5.28%

9

6.08%

8

5.31%

6

8.62%

70

5.15%

25

5.03%

13

5.73%

11

5.71%

8

6.25%

95

5.16%

40

5.21%

18

5.19%

15

5.86%

11

6.61%

127

5.10%

65

5.01%

25

5.09%

20

5.28%

14

5.47%

165

5.09%

111

5.00%

35

5.07%

27

5.15%

19

5.95%

212

5.01%

212

5.01%

51

5.06%

38

5.27%

26

5.51%

274

5.02%

524

5.00%

77

5.06%

56

5.15%

36

5.12%

356

5.05%

2577

5.00%

126

5.05%

91

5.06%

55

5.11%

466

5.03%

3000+

1.60%

239

5.00%

169

5.01%

104

5.05%

623

5.02%

 

 

654

5.00%

462

5.00%

359

5.01%

855

5.01%

 

 

2000+

5.05%

1000+

4.17%

500+

3.08%

1232

5.01%

 

 

 

 

 

 

 

 

1922

5.00%