# # Copyright (c) 2014 NetApp, Inc., All Rights Reserved # Any use, modification, or distribution is prohibited # without prior written consent from NetApp, Inc. # package NACL::MTask::DataSet; use Moose; use MTF::MooseCompatibleException qw(:all); use Data::Dumper; use NACL::Exceptions::NoElementsFound; use NACL::C::VolumeDirectory; use NACL::C::VolumeFile; use NACL::MTask::VolumeFileIterator; =head1 NAME NACL::MTask::DataSet =head1 SYNOPSIS my $qtree_basic_dataset = NACL::MTask::DataSet::QtreeBasic->create( volume => $volume, clients => [ @clients ], ); $qtree_basic_dataset = NACL::MTask::DataSet::QtreeBasic->find( volume => $volume, clients => [ @clients ], ); $qtree_basic_dataset->validate(); $qtree_basic_dataset->purge(); =head1 DESCRIPTION There are so many variations on datasets across the company that we can not expect a single dataset to address all needs. Additionally, there are certain kinds of data that are not compatible with certain features. For example, a test of UNIX security style volumes will have trouble with SMB style ACLs; as of FS.0, file ZAPIs are unable to handle file names with characters outside of the English alphabet; the on-box file clone command may not be capable of using a file name with quote characters. These exclusions point to a need for modular datasets that can be easily selected based on need. Although we can't please everyone with a universal dataset, we can package up our datasets in ways that can be shared across groups. A common location makes different datasets easy to find, and a common interface reduces the mental burden to start using an unfamiliar dataset. These individual datasets can be combined into well-known and well-maintained collections. A dataset doesn't have to be completely black-box nor opaque to the user. There is no reason that a dataset can't be modeled by an object instance in the same way that we model volumes with NACL components. Method calls on the object instance would allow tests to use particular files of interest from the dataset. At the same time that we start using these datasets in automation, we would like to do so without a increase in testcase execution time. Many groups are already using XCDR, FlexClones, and Volume Copy to do this, but a standardized interface would make this functionality available for all datasets. Improvements in execution time can't be done at the expense of maintainability or functionality. There are certain datasets that can't be copied, such as locked files, corrupt files, or iofenced files. Old copies of datasets should not be used when their maintainers have improved upon them with newer versions. A dataset management framework will allow us to write tests in a way where coverage can be ratcheted up as more datasets become shared. The centralization of common features and a standardized interface will help teams to leverage other efforts to improve their testing while reducing the required time investment. At the same time, the aggregation of several disparate datasets into named composite collections can reduce maintenance costs associated with updated population code causing existing tests to break. =cut =head1 ATTRIBUTES =head2 volume A NACL::C::Volume instance to create or find the DataSet on. =head2 clients An array reference of NACL::C::Client instances to be used for create and validation. =head2 version (Private) The version number of the DataSet instance. The version number is used to help a library identify that the data on disk was created with an older version of the library. DataSets can either refuse to use the older data or may provide reduced functionality. =head2 instance (Private) A string that helps to uniquely identify a DataSet if multiple copies have been created on the same volume. =head2 manifest_file (Private) A NACL::C::VolumeFile instance for the DataSet manifest metafile. =head2 directory_names (Private) An array reference of directory names relative to the volume root. These are the directories that hold the contents of the DataSet. =head2 fast_mode (Private) Set to 1 if the DataSet should operate in a minimal mode suitable for library regression testing. =head2 original_creator (Private) This is populated with information about who originally created the DataSet. It may be useful for tracing for long lived datasets that are being provisioned via XCDR, restored via backup, or cloned from other volumes. =cut has 'volume' => ( is => 'rw', isa => 'Object', ); has 'clients' => ( is => 'ro', isa => 'ArrayRef[Object]', default => sub { return []; }, ); has 'version' => ( is => 'rw', isa => 'Int', default => 1, ); has 'instance' => ( is => 'ro', isa => 'Str', default => sub { my $instance = time()."_".int(rand(1000000)); return $instance; }, ); has 'manifest_file' => ( is => 'rw', isa => 'Object', ); sub _file_extension { return '.dataset-manifest'; } has 'directory_names' => ( is => 'rw', isa => 'ArrayRef[Str]', default => sub { return []; }, ); has 'fast_mode' => ( is => 'ro', isa => 'Int', default => 0, ); has 'original_creator' => ( is => 'ro', isa => 'HashRef', default => sub { return { logdir => param('LOGDIR'), username => getlogin || getpwuid($<) || 'Unknown User', }; }, ); =head1 METHODS =cut =head2 new my $dataset = PkgName->new( volume => $volume_instance, clients => [ $unix_client_instance, $windows_client_instance ], ); (static method) Create an object modeling a dataset without putting any data on disk. This method is typically only used internally by the find() method and should be avoided unless there is a good reason to call it. Either use create or find instead. =over =item Options =over =item C<< volume=>$volume >> (Required) A NACL::C::Volume instance for the volume to be used with the DataSet. =item C<< clients=>[$unix_client,$windows_client] >> (Usually Required) An array ref of NACL::C::Client instances to be used with the DataSet. There may be DataSets that don't require clients, but many DataSets will require at least one Windows and one Linux client. =item C<< directory_names >> (Private, Optional) directory_names is passed by find when it is deserializing a DataSet found on disk. Callers should typically not specify this option. =item C<< fast_mode >> (Private, Optional) Set this to 1 when running regression tests. Individual DataSet modules use this attribute to run in a faster mode to enable quick testing. This will typically only be passed from the library regression tests. =item C<< instance >> (Private, Optional) The instance is an identifier that helps to uniquely identify a DataSet if multiple copies are created on a volume. This is passed by find when deserializing a DataSet found on disk. Callers should typically not be specifying this option. =item C<< manifest_file >> (Private, Optional) A NACL::C::VolumeFile instance representing the DataSet manifest metafile. find retrieved the manifest information from a NACL::C::VolumeFile. =back =back =over =item Returns Instance of DataSet library. =back =cut =head2 create $dataset->create(); my $dataset = PkgName->create( volume => $volume_instance, clients => [ $unix_client_instance, $windows_client_instance ], ); (instance method) The create method is part of the standard DataSet interface. It must always work with the minimum of volume and clients as arguments and may not require additional arguments. It is possible that individual DataSets may take additional optional arguments, but they will not be required. =over =item Options =over =item C<< volume=>$volume >> (Required) A NACL::C::Volume instance for the volume to be used with the DataSet. =item C<< clients=>[$unix_client,$windows_client] >> (Usually Required) An array ref of NACL::C::Client instances to be used with the DataSet. There may be DataSets that don't require clients, but many DataSets will require at least one Windows and one Linux client. =back =back =cut # We define a basic create method here that will be overwritten by a DataSet. # This allows a DataSet author to get up and running without even defining a # create method. sub create { my ($self_or_pkg, %args) = @_; my $self = $self_or_pkg->_new_if_required(%args); $self->_write_manifest_file(); return $self; } # This method makes it easy to provide create as either a static or instance # method. sub _new_if_required { my ($self_or_pkg, %args) = @_; if (!ref $self_or_pkg) { return $self_or_pkg->new(%args); } return $self_or_pkg; } =head2 purge $self->purge(); (instance method) Purge the dataset by deleting all data in the directories associated with the DataSet and the DataSet manifest metafile. =cut sub purge { my ($self) = @_; foreach my $dir_name (@{$self->directory_names()}) { $Log->comment("Will purge $dir_name"); my $systemshell = $self->volume()->command_interface()->apiset( category => 'Node', interface => 'CLI', set => 'Systemshell', ); $systemshell->rm( force => 1, recursive => 1, paths => "/clus/".$self->volume()->vserver()."/". $self->volume()->state()->junction_path()."/". $dir_name, 'privilege-level' => 'root', ); } $self->_purge_manifest_file(); return $self; } sub _purge_manifest_file { my ($self) = @_; # burt786482 workaround, normally we would call # $self->manifest_file()->delete(apiset_must => { interface => 'ZAPI' }); NACL::C::VolumeFile->delete( command_interface => $self->manifest_file()->command_interface(), vserver => $self->manifest_file()->vserver(), paths => [ $self->manifest_file()->path() ], determine_appropriate_nodescope_node => 1, ); } =head2 validate $self->validate(); (instance method) Validate the dataset according to whatever validation is provided by the DataSet. =cut sub validate { # Provide an empty implementation in case the DataSet doesn't provide # validation support. } # Take the NACL::C::VolumeFile that represents a manifest file stored on disk # and turn it back into a PERL hash reference. sub _manifest_from_file { my ($self, $manifest_file) = @_; my $content = $manifest_file->read( determine_appropriate_nodescope_node => 1, ); # The file is serialized by writing the output of Data::Dumper to disk and # then evaling that file to get it back into memory my $manifest = eval "my ".$content; if ($@) { NATE::BaseException->throw("Got error of $@ while trying to eval ". $manifest_file->path() ); } return $manifest; } =head2 find @all_datasets = NACL::MTask::DataSet->find( volume => $volume_instance, clients => [ $unix_client_instance, $windows_client_instance ], ); @all_qtree_datasets = NACL::MTask::DataSet::QtreeBasic->find( volume => $volume_instance, clients => [ $unix_client_instance, $windows_client_instance ], ); (Static method) Find existing DataSets that have been created on a volume. The clients are passed in just in case a DataSet needs them for a validate method or additional mutations to be performed on the DataSet. It is possible that DataSets will not need clients, but the caller should expect that they may be required in certain cases. =over =item Options =over =item C<< volume=>$volume >> (Required) A NACL::C::Volume instance for the volume to be used with the DataSet. =item C<< clients=>[$unix_client,$windows_client] >> (Usually Required) An array ref of NACL::C::Client instances to be used with the DataSet. There may be DataSets that don't require clients, but many DataSets will require at least one Windows and one Linux client. =back =back =cut sub find { my ($self, %args) = @_; my $volume = $args{volume}; my $clients = $args{clients}; my $allow_empty = $args{allow_empty} || 0; my @datasets; my $volume_root_dir = NACL::C::VolumeDirectory->new( command_interface => $volume->command_interface(), vserver => $volume->vserver(), path => $volume->construct_dblade_path(), ); my @directory_files = $volume_root_dir->find_files(allow_empty => 1); my @manifest_files; if (@directory_files) { my $extension = $self->_file_extension(); @manifest_files = grep { $_->path() =~ /$extension$/ } @directory_files; } foreach my $manifest_file (@manifest_files) { my $manifest = $self->_manifest_from_file($manifest_file); if (!$manifest) { next; } my $dataset_manifest = $manifest->{dataset_manifest}; my $pkg = $dataset_manifest->{pkg}; my $new_args = $dataset_manifest->{new_args}; # We should only find things that are of the same type as this package. # Skip anything where the type does not match. my $self_type = $self || ref $self; if (!$pkg || !$pkg->isa($self_type)) { next; } my $dataset = $pkg->new( # Make an inline file w/ xdir links # Make a regular file w/ xdir links volume => $volume, clients => $clients, %$new_args, # directory_names => $dataset_manifest->{directory_names}, # instance => $dataset_manifest->{instance}, manifest_file => $manifest_file, ); if ($dataset->can('thaw')) { my $thaw_manifest = $manifest->{thaw_manifest} || {}; $dataset->thaw($thaw_manifest); } push @datasets, $dataset; } if (!@datasets && !$allow_empty) { NACL::Exceptions::NoElementsFound->throw( "Unable to find any DataSets matching the request" ); } return wantarray ? @datasets : $datasets[0]; } =head1 INTERNAL METHODS The methods documented here may be called by individual DataSet modules. There are other methods that are defined, but should not necessarily be called from an individual DataSet module. =cut =head2 _register_data_directory_name $self->_register_data_directory_name('dataset_dir_instance'); (instance method) Registers the specified directory name as being part of this dataset. These directory names are tracked so they can be purged. This should only be used if a DataSet is creating directories with a private implementation or doing something like creating a qtree that should be deleted. =over =item Options =over =item C<< $dir_name >> (Required) The name of the directory, relative to the root of the volume. =back =back =cut sub _register_data_directory_name { my ($self, $dir_name) = @_; push @{$self->directory_names()}, $dir_name; return; } =head2 _create_dataset_directory $self->_create_dataset_directory(); (instance method) Create and register a directory to contain the files for the DataSet. The caller of this method must not register the directory again since it has already been registered. =over =item Returns NACL::C::VolumeDirectory instance representing the directory created for the DataSet to use. =back =cut sub _create_dataset_directory { my ($self) = @_; my $dir_name = (ref $self).'-'.$self->instance(); $dir_name =~ s/://g; $self->_register_data_directory_name($dir_name); my $directory = $self->volume()->directory_create( dirname => $dir_name, perm => '777', ); return $directory; } =head2 _get_dataset_directories my $directory = $self->_get_dataset_directories(); (instance method) Get the directory objects for any registered DataSet directories. =over =item Returns NACL::C::VolumeDirectory instances representing the registered DataSet directories. Returns a single directory in scalar context and a list otherwise. =back =cut sub _get_dataset_directories { my ($self) = @_; my @dirs; foreach my $dir_name (@{$self->directory_names()}) { push @dirs, NACL::C::VolumeDirectory->new( command_interface => $self->volume()->command_interface(), vserver => $self->volume()->vserver(), path => $self->volume()->construct_dblade_path()."/$dir_name", ); } return wantarray ? @dirs : $dirs[0]; } # Friends of this class should not call this method. sub _get_client_of_type { my $self = $_[0]; my $client_type = $_[1]; my @desired_clients; foreach (@{$self->clients()}){ ## support for lanforge-client , burt951104 my $client = $_; if(($client->type() =~ /$client_type/) || ($client_type =~ /unix/ && $client->command_interface->hostrec->isa('Hostrec::LanForge::Fire'))){ push @desired_clients, $client ; } } if (scalar(@desired_clients) == 0) { NACL::Exceptions::NoElementsFound->throw("No $client_type clients " . "found!" ); } else { return pop @desired_clients; } } =head2 _get_windows_client my $windows_client = $self->_get_windows_client(); (instance method) Returns a NACL::C::Client object for a windows client. =cut sub _get_windows_client { my $self = $_[0]; return $self->_get_client_of_type("windows"); } =head2 _get_unix_client my $unix_client = $self->_get_unix_client(); (instance method) Returns a NACL::C::Client object for a unix client. =cut sub _get_unix_client { my $self = $_[0]; return $self->_get_client_of_type("unix"); } # Generates a file name for the manifest file. This file must be unique and be # discoverable using find. Currently, we use a file extension to help find # discover these files. # Example return: /vol/myvolume/NACLMTASKDataSetQtreeBasic.dataset-manifest sub _manifest_file_path { my ($self) = @_; my $manifest_name = $self->volume()->construct_dblade_path."/".(ref $self).'-'. $self->instance().$self->_file_extension(); $manifest_name =~ s/://g; return $manifest_name; } =head2 _write_manifest_file $self->_write_manifest_file(); (instance method) Writes out a metafile for the current DataSet object instance. A few things are packed into the object that are stored for all DataSet objects, such as the package name, instance number, version of the package, and the directory names associated with DataSet data. This method will call a freeze method if one has been defined. This gives the individual DataSet implementation the opportunity to save additional data that can't easily be regenerated. When the metafile is written, the NACL::C::VolumeFile instance will be registered in the manifest_file attribute for this object instance. This method should be called any time the state of a DataSet is updated in a way that requires persistence to a metafile. =cut sub _write_manifest_file { my ($self) = @_; my $volume = $self->volume(); my $metafile = NACL::C::VolumeFile->new( command_interface => $volume->command_interface(), vserver => $volume->vserver(), path => $self->_manifest_file_path(), ); my $thaw_manifest = {}; if ($self->can('freeze')) { $thaw_manifest = $self->freeze(); } my $manifest = { dataset_manifest => { pkg => (ref $self), # Place any arguments here that should be passed to $self->new new_args => { instance => $self->instance(), version => $self->version(), directory_names => $self->directory_names(), original_creator => $self->original_creator(), }, }, thaw_manifest => $thaw_manifest, }; $metafile->write( content => Dumper($manifest), ); $self->manifest_file($metafile); return $self; } =head2 _fixup_permissions $self->_fixup_permissions(); (instance method) This method opens up the permissions of any files inside of the DataSet, including the metafile. This could be called at the end of create after files have been made, but the permissions might not be suitable for doing I/O from a client. Files created via ZAPI or mkfile typically don't have permissions that would allow a client to use them, this function will fix that. Ideally, the DataSet should create the files with permissions that let the test clients use them, but if that is not possible the DataSet can call this method to fix things up. =cut sub _fixup_permissions { my ($self) = @_; my $volume = $self->volume(); my $manifest_file = $self->manifest_file(); $manifest_file->set_permission(permission => '777'); my @dataset_directories = $self->_get_dataset_directories(); foreach my $dir (@dataset_directories) { my $volume_relative_dir_path = $dir->get_file_or_dir_name_from_path( path => $dir->path() ); my $iterator = NACL::MTask::VolumeFileIterator->new( volume => $volume, path => "/$volume_relative_dir_path", ); while ($iterator->has_next()) { my $file = $iterator->next(); $file->set_permission(permission => '777'); } } } =head1 WRITING A DATASET =head2 About your DataSet We don't have to provide the option to access specific elements of the dataset, but it does allow a dataset to be more useful for a functional test. Each of these attributes or methods to access data within a DataSet becomes a published interface and should be maintained for backwards compatibility. When providing methods to access files within the dataset, the author must take care to ensure that they are usable in both the populate and find cases. The individual library can do this by either hardcoding file names, or by packaging some information that will be serialized into the dataset-manifest file. =head2 Regression Test Suite Each DataSet library must have a regression test suite. There is a minimum set of regression tests that each library must pass, but DataSets may require additional regression tests if they have optional methods to access or model the data. A regression test that uses the standard suite of tests can easily be created by copying an existing thpl and reworking from there. use data_mobility::ondemand::lib::TCDFramework qw(:all); use NACL::UnitTest::MTask::DataSetStandardTests qw(:all); our ($PKG, $Framework); $PKG = 'NACL::MTask::DataSet::QtreeBasic'; $Framework = data_mobility::ondemand::lib::TCDFramework->framework_instance(); $Framework->set_topology('data_mobility::ondemand::lib::LightningRegressionTestTopology'); # Several standard tests are pulled in by DataSetStandardTests. # Additional tests specific to this DataSet module are defined here. # sub framework_specific_test { my $topology = $Framework->topology(); my $volume = $topology->volume(); # DataSet specific test here return $TCD::PASS; } main(); =head2 Required methods to implement =item create Create is one of the only methods needed to make a DataSet library. If the library author wants a traditional opaque dataset that doesn't provide the user any insight into the data, they can just write a create method. The create method can make use of any of the documented internal methods of DataSet.pm, such as _find_unix_client. It also must arrange to have _write_manifest_file called at some point. =head2 Optional methods to implement =item validate A DataSet can support validation that existing data is still healty. The DataSet could check that locks are still locked, checksums still match, sparse files are still sparse, or anything else of interest. Validation implies a number of obvious read-only checks, but the method does not have to limit itself to this. A block maximum reference count validation routine could add another duplicate block to take the file to maxrefcnt+1, truncate to maxrefcnt-1, then grow the file back to maxrefcnt. A hardlink validation could shake the B+ I2P tree by doing a rename w/ victim, check the I2P database, then create another link and check the I2P database again. If a DataSet decides to do read-write validation it must first make sure that the DataSet lives on a read-write volume. Another possibility for DataSets is that they may end up on another volume if the volume is mirrored, moved, renamed, or restored from tape. Validate should throw exceptions if the DataSet has not been preserved by the transformation. VolumeMove should preserve locks, but a NAS side copy will not preserve sparseness. It is the calling test's responsibility to determine if the validation failure is reasonable given the transformations made to the DataSet. The DataSet is responsible for making sure that any serialized information doesn't unnecessarily bind the DataSet to the present location. =item freeze An opaque black-box dataset won't have to implement freeze, but if the DataSet library provides modeling of the data there may be state required to be saved into the metafile. This method is called by the _write_manifest_file and is expected to return a hash reference. The return value of this method will be passed into thaw. If freeze is not implemented an empty hash reference is passed to thaw. If a DataSet created a random file name with 6 streams that is accessible via a public method on the DataSet it may use freeze/thaw to arrange preservation of that attribute. =item thaw This method is called by find after new'ing the object. The hash reference returned from the prior freeze call is passed as an argument to this method. The thaw method is reponsible for recreating any local attribute state. It is possible that file names are hardcoded relative to the DataSet's directory. In this case we might not need any data to be returned from freeze. =item version This attribute can be set to allow versioning of the on disk data. This could be used to identify when the on disk data was created by a different version of the library than the one being executed. =item purge The DataSet.pm library creates generic purge method that should be sufficient to delete most data, but each implementor has the option of creating their own custom purge method. If making a custom purge method, then arrange to have _purge_manifest_file called to eliminate traces of the DataSet from the volume. =cut =head1 SEE ALSO https://wikid.netapp.com/w/NACL/Tasks/DataSet =cut 1;