.. _baseline_measurements:


*********************************************************
Running Baseline Phase For the First Time With Your Cloud
*********************************************************

.. _prepare_baseline:

This section assumes that CBTOOL is already started and has
successfully connected with your cloud. 

Setting Up Parameters
=====================

In baseline phase, application instances for the two workloads,
KMeans and YCSB, are created five times. That is, instances are
provisioned, data is generated, load generator is run, data is
deleted, and then the instances are deleted. This is 
controlled by the following parameters::

  iteration_count: 5
  run_count: 1
  destroy_ai_upon_completion: true 

Thus, a total of 35 and 30 instances are created and destroyed for
each YCSB and KMeans workloads, respectively.

Creation of data, instantiation of load generator, and deletion of
data comprises a run, which is controlled by run_count parameter.
If a tester knows that in their cloud, baseline results will be worse
than elasticity phase results (due to performance isolation etc),
they must set the run_count to five or higher before starting a 
compliant run.

For compliant run, iteration_count must be 5 and destroy_ai_upon_completion
must be true.


Cloud Name
----------

Please make sure that the cloud name in ``osgcloud_rules.yaml`` matches the cloud name
in the CBTOOL configuration.::

  cloud_name: MYOPENSTACK

CBTOOL configuration file is present in ``~/osgcloud/cbtool/configs/\*_cloud_definitions.txt``

YCSB Baseline Measurement
=========================

Preparation
-----------

Set the appropriate thread count for YCSB in the osgcloud_rules.yaml file, e.g.,::

  For centos images:
  uncomment below line under cassandra section:

  #uncomment this for centos images
  #cassandra_conf_path: /etc/cassandra/conf/cassandra.yaml

  should be:

  #uncomment this for centos images
  cassandra_conf_path: /etc/cassandra/conf/cassandra.yaml

  for centos & ubuntu images: 

  thread_count: 8


The tester will have to measure the thread count yourself for your cloud. The default thread count is 8.

In general, the higher the thread count, the higher will be the
throughput (it will reach capacity for AI with some number of threads).
Consequently, the scalability results of a cloud under test may be higher, if there is no
drastic decrease in elasticity measurements.

Running
-------

The YCSB baseline script parameter description is as follows::

  usage: osgcloud_ycsb_baseline.py [-h] [--console_log_level CONSOLE_LOG_LEVEL]
                                 [--runrules_yaml RUNRULES_YAML]
                                 [--flush_log FLUSH_LOG] [--version] --exp_id
                                 EXP_ID

It is run as follows::

  python osgcloud_ycsb_baseline.py --exp_id SPECRUNID

  where SPECRUNID indicates the run id that will be used across baseline
  and elasticity + scalability phases.

By default, the script logs the run to a file. If you will like to show the run on the console, type the following::

  python osgcloud_ycsb_baseline.py --exp_id SPECRUNID --console_log_level DEBUG


By default, the results for this experiment are present in::

  ~/results/SPECRUNID/perf/

If five iterations are run (which are needed for a compliant run), the tester should expect to find 
five directories starting with ``SPECRUNIDYCSB`` in the ``~/results/SPECRUNID/perf`` directory.

Following files will be present in the directory. The date/time in file and directory names will match the date/time of your run::

	baseline_SPECRUNID.yaml
	osgcloud_ycsb_baseline_SPECRUNID-20150811233732UTC.log
	SPECRUNIDYCSBBASELINE020150811233732UTC
	SPECRUNIDYCSBBASELINE120150811233732UTC
	SPECRUNIDYCSBBASELINE220150811233732UTC
	SPECRUNIDYCSBBASELINE320150811233732UTC
	SPECRUNIDYCSBBASELINE420150811233732UTC


K-Means Baseline Measurement
============================

Preparation
-----------

The following parameters may be changed in ``osgcloud_rules.yaml`` depending
on how Hadoop was set up in the instance image. The default value of the 
parameters is shown below::

        centos images:
        java_home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85-2.6.1.2.el7_1.x86_64
        hadoop_home: /usr/local/hadoop
        dfs_name_dir: /usr/local/hadoop_store/hdfs/namenode
        dfs_data_dir: /usr/local/hadoop_store/hdfs/datanode

        ubuntu images: 
 
        java_home: /usr/lib/jvm/java-7-openjdk-amd64
        hadoop_home: /usr/local/hadoop
        dfs_name_dir: /usr/local/hadoop_store/hdfs/namenode
        dfs_data_dir: /usr/local/hadoop_store/hdfs/datanode

Running
-------

The KMeans baseline script parameter description is as follows::

  usage: osgcloud_kmeans_baseline.py [-h] [--console_log_level CONSOLE_LOG_LEVEL]
                                 [--runrules_yaml RUNRULES_YAML]
                                 [--flush_log FLUSH_LOG] [--version] --exp_id
                                 EXP_ID

It is run as follows::

  python osgcloud_kmeans_baseline.py --exp_id SPECRUNID

  where SPECRUNID indicates the run id that will be used across baseline
  and elasticity + scalability phases.

By default, the script logs the run to a file. If you will like to show the run on the console, type the following::

  python osgcloud_kmeans_baseline.py --exp_id SPECRUNID --console_log_level DEBUG


By default, the results for this experiment are present in::

  ~/results/SPECRUNID/perf/

If five iterations are run (which are needed for a compliant run), the tester should expect to find
five directories starting with ``SPECRUNIDKMEANS`` in the ``~/results/SPECRUNID/perf directory``.

Following files will be present in the directory. The date/time in file and directory names will match the date/time of your run::

        baseline_SPECRUNID.yaml
        osgcloud_kmeans_baseline_SPECRUNID-20150811233302UTC.log
      	SPECRUNIDKMEANSBASELINE020150811233302UTC
      	SPECRUNIDKMEANSBASELINE120150811233302UTC
      	SPECRUNIDKMEANSBASELINE220150811233302UTC
      	SPECRUNIDKMEANSBASELINE320150811233302UTC
      	SPECRUNIDKMEANSBASELINE420150811233302UTC

Configuring Supporting Evidence Collection
===========================================

Make sure that supporting evidence parameters are set correctly in 
osgcloud_rules.yaml file.::

        support_evidence:

            instance_user: cbuser
            instance_keypath: HOMEDIR/osgcloud/cbtool/credentials/cbtool_rsa
            support_script: HOMEDIR/osgcloud/driver/support_script/collect_support_data.sh
            cloud_config_script_dir: HOMEDIR/osgcloud/driver/support_script/cloud_config/
        
            ###########################################
            #  START instance support evidence flag is true
            # for public and private clouds. host flag
            # is true only for private clouds or for
            # those clouds where host information is 
            # available.
            ###########################################
            instance_support_evidence: true
            host_support_evidence: false
            ###########################################
            # END
            ###########################################

``instance_user`` parameter indicates the Linux user that is used to SSH into the
instance. It is also set in the cloud configuration text file for CBTOOL.

``instance_key_path`` indicates the SSH key that is used to SSH into the instance. 
Please make sure that the permissions of this file are set to 400 (chmod 400 KEYFILE)

``support_script`` indicates the path of the script that is used to gather
supporting evidence.

``cloud_config_script_dir`` indicates the path where scripts relevant to gathering 
cloud configuration are present. These scripts differ from one cloud to the other.

``instance_support_evidence`` indicates that whether to collect supporting evidence
from instances. This flag is ignored for simulated clouds. For testing of baseline phase, 
it is recommended to set this flag to false.