Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud computing

ABSTRACT


INTRODUCTION
The objective of "computer as a utility" has been realized via cloud computing, a new platform for distributed computing [1].The cloud was created by combining two significant computing technologies, cluster computing, and grid computing [2].Cloud computing provides specifically on-demand access to trustworthy resources and pay-per-use customization of computer settings [3].Cloud computing offers a range of dynamically scaled, virtualized, abstracted, and adaptable computer resources and services [4].Networks, storage, servers, and applications are all made available as cloud resources in a subscription-based paradigm.The three main architectural elements of cloud services are infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).Due to the demand-based availability of high-speed internet connectivity, these services are accessible to clients outside the company SaaS.
Research and commercial applications make extensive use of cloud computing's capabilities and services [5].Business apps are organized according to best practices and are task-oriented.To manage such operations, organizations adopt business models like Amazon EC2.In contrast, scientific applications are created according to scientific principles and are data-driven.In addition to the Pegasus workflow management system (WMS), other systems are utilized to coordinate scientific activities.According to Faragardi et al. [6], scientific applications are data-intensive and need a substantial amount of computing power and storage space for assessment and computation.Applications for scientific workflows are precisely groups of computing activities structured in various ways.When organized as a scientific process, even a single scientific application comprises several computer network tasks.Each phase of scientific study requires a large amount of data and computational resources.Numerous additional academic disciplines, including astronomy, biology, gravitational physics, and earthquake science, have universal applicability.Multiple cloud models are designed to be optimized by workflow schedule; nevertheless, customers must give priority to quality of service (QoS) satisfaction when submitting workflow applications, including cost execution and deadline.In addition, there are issues with energy consumption, time constraints, makespan optimization, and cost reduction due to the increasing demand for computers and services in scientific workflow applications.Nodes are used to represent jobs in the direct acyclic graph (DAG) modeling approach for workflows, while edges are used to show interactions between tasks [7].DAG is therefore used to express workflows.
The fault tolerance model utilizes a single backup in the event of a failure; as a novel concept, it was initially enticing, but the complexity of scientific processes renders it incapable of withstanding several failures.As a result, it is required to develop and execute a fault tolerance model that can withstand repeated failures and increase the requirement for dependability; cost optimization is also crucial, as the ideal cost symbolizes the model's effectiveness [8].There are two sorts of fault occurrence methodologies for managing the various types of failure: active and reactive fault occurrence processes.In addition, a reactive fault-tolerance method is intended to mitigate the issue and enhance the efficacy of the fault-tolerance technique.In addition, while incurring execution costs, this fault-tolerant technique offers system dependability [9].The goal was to decrease response time and energy usage.In addition, by using replication, this strategy decreases the labor required to reject data due to system failure occurrences; yet, cost optimization remains its largest challenge.Moreover, Shahidinejad and Barshandeh [10] created fault-tolerant dynamic scheduling (FTDS), a faulttolerance technique intended to overcome the static scheduling strategy.FTDS recognizes the processor fault and then attempts to efficiently reschedule the halted operations.
There are several actions and tasks with various limitations in scientific workflows.For management's benefit, various workflow management techniques are used to plan and carry out scientific processes on the intended resources [11], [12].When managing and scheduling scientific workflows, several factors may be used in a system that could result in performance degradation, involving the creation, organization, and control of scientific workflows as well as task management, resource management, scheduling guidelines, and faulttolerant procedures [13].Additionally, some of the processes are too big to transfer across nodes without incurring additional costs.Five practical scientific methods for various scientific applications were thoroughly analyzed from the start [14].In the following fields, these techniques can be applied: i) seismic research with CyberShake; ii) biological research with sRNA identification protocol using high throughput technology (SIPHT); iii) astronomy using Montage; iv) genetic research with epigenomics; and v) gravitational physics with laser interferometer gravitational wave observatory (LIGO).
The research details each scientific approach's organizational, data, and technological needs.Additionally, words from the computer industry are used to illustrate the many structural and functional aspects of scientific procedures.A few examples of these characteristics are the pipeline, data parallelism, data dispersion and redistribution, data aggregation, and compositions of scientific procedures.A scalable workflow management system used for automating research is called a WMS.It was characterized by Fan et al. [15].Distributed computer infrastructures are changed to provide room for scientific process conceptual models.In our research, we analyze and comprehensively address the aforementioned issues by presenting: a data-driven, fault-tolerant model is developed.
The following are the most significant contributions of this study: i) this research work designs and develops a hybrid cost-aware fault tolerant (HCFT) mechanism in workflow scheduling considering a cloudcomputing environment; ii) HCFT minimizes cost through novel clustering and optimal balancing mechanism, which reduces the entire cost; moreover fault-tolerance is carried out through the clustering approach; also parallel processing is carried out for further optimization of the process; and iii) HCFT is evaluated considering scientific workflow to prove the model efficiency based on the execution cost for various variants of workflow like CyberShake, Montage, SIPHT, and Inspiral.
This particular research is organized as follows: the first section starts with the background of cloud computing and the integration of workflow scheduling along with it.The further section proceeds with the challenges and needs for the fault-tolerant model in workflow scheduling.The section ends with a research contribution.The second section focuses on discussing the existing fault-tolerant technique concerning cost reduction; the third section designs the mathematical modeling of the HCFT model and is evaluated in the fourth section.

RELATED WORK
Five real scientific methods with various scientific applications were thoroughly examined, starting with Fan et al. [15].Epigenomics is used to investigate genetics, Montage is used to study astronomy, CyberShake is used to analyze seismic activity, SIPHT is used to study biology, and LIGO is used to study gravitational physics.The organizational, data, and computer needs of each scientific technique are covered in the article.Additionally, certain structural and functional features of scientific methods are examined using computers.This includes information on data aggregation, data parallelism, data dispersion and redistribution, and data aggregation.It also includes works that employ the scientific approach.According to research by Ahmad et al. [16], Pegasus is a scalable WMS for the automation of research.Distributed computer systems are constructed using abstract models of scientific procedures.
A scheduling technique for scientific workflows called dynamic scheduling of a bag of tasks-based workflows (DSB) tries to cut costs while still following user-specified time limits.The technique breaks down into bag of tasks (BoTs), optimizes their distribution, and then schedules them in line with priority restrictions and data dependencies.The method satisfies the process's deadline and significantly lowers the cost of workflow computing.Utilizing methods that quickly and efficiently utilize resources is a requirement for planning scientific work.According to previous research [17]- [19], adaptive data-aware scheduling (ADAS) is a scheduling technique that gives priority to resource use and workflow completion time.It is a technique for combining task management and data for a range of jobs in the cloud.Although the suggested scheduling approach is efficient, it does not consider error tolerance, which is an essential component of workflow scheduling.The budget driven algorithm for generating high-quality schedules (BAGS) scheduling method was developed in [20] to maximize process execution time while respecting budgetary limitations.The BAGS algorithm distributes money to activities before making choices about dynamic resource provisioning and scheduling in response to environmental changes.The approach still needs to be improved in terms of time constraints and error-tolerance measures even if it is effective for scheduling scientific activities while adhering to financial constraints.When organizing scientific procedures, the designed dynamic benefit weighted scheduling (DBWS) method takes time and financial constraints into consideration.
Activities that require a lot of data and calculation are thought to utilize the most energy.To use less energy, Juarez et al. [21] offered a real-time dynamic scheduling system for the effective execution of taskbased applications.The authors developed a polynomial-time solution that satisfies the requirements of low energy consumption and quick execution speed by combining a resource allocation method with a set of heuristic criteria.Authors fail to address the significant problem of accessibility of fault-tolerant systems [22].Proclaims that when fault-tolerant practices are foregone to maintain profitability or save costs, cloudcomputing systems collapse.The fault-tolerant technique for scientific workflow systems received approval for use in scientific operations in its second submission.For scientific processes to run properly, fault-tolerant techniques are required since the failure of bottleneck nodes renders the entire operation meaningless [23].If a job is not successfully finished during execution, it is promptly resubmitted to the same resource or another.As a result, faster, a dynamic fault-tolerant scheduling approach was created (a method for fault-tolerant scheduling of real-time scientific workflows).
Applications for scientific methods include detailed, multi-level computation.Since all activity levels demand the same services, this sort of computing is perfect for fault-tolerant clustering (FTC) systems.The FTC method, developed in [24]- [26], is useful for many scientific projects.The aforementioned study offers three techniques: dynamic clustering (DC), selective re-clustering (SR), and dynamic re-clustering (DR).DC's initial approach is to maintain a clustering factor that is dynamically adjusted to the rate of job failure.The second technique, known as SR, is repeatedly carrying out failed tasks inside a single position.The third technique, known as DR, combines the first two strategies by providing failing tasks within a job a second chance in addition to dynamically maintaining the clustering factor based on the failure rate of recognized activities.Improvements to data-oriented scheduling methods with dynamic clustering defeat-resistant mechanism the title of the piece is [27].Data-oriented scheduling was integrated by the enhanced data-oriented scheduling strategy with dynamic clustering fault-tolerant technique (EDS-DC) developers.A dynamic clustering method with fault tolerance was offered by EDS-DC.The results of processes that were modeled using workflows were compared to the three well-known scheduling rules minimum completion time-dynamic clustering (MCT-DC), max-min-DC, and min-min-DC.Simulations show that EDS-DC significantly decreased cost and manufacturing time when compared to traditional methods.For scientific activities, Chakraborty et al. [28] proposes a QoS aware fault tolerant workflow management system (QFWMS).A cluster-based, fault-tolerant, data-intensive (CFD) approach is given in [29] for cloud-based scientific applications.The suggested CFD approach provides a precise strategy for obtaining outcomes from the presentation of scientific data.The figures show that, when using the Montage workflow, the CFD approach outperformed the alternatives [30].Applications of scientific processes have other distinctive features aside from pipelining, parallelism, integration, and disintegration.These programs use a lot of computation and information.Scientific workflow applications require workflow management and fault-tolerant scheduling systems that are data-centric and have high data storage and processing capacities.Scientific workflows must be gathered, categorized, and managed using a data-oriented schedulingbased energy-efficient WMS since they are composed of several data-and computation-intensive applications at the bottleneck node or level, a large number of workflow activities are performed, and if even one of them fails, the execution as a whole is meaningless.A fault-tolerant system is therefore required.Given that different workflow activity at various levels, have comparable service and resource needs, scientific operations can be carried out using a cluster-based scheduling and fault-tolerant approach.A fault-tolerant, data-centric, and energy-efficient system for coordinating and scheduling scientific activities is produced because of these restrictions.

PROPOSED METHOD
The scientific community to perform the task using IaaS cloud as a platform has extensively utilized workflow schedules; moreover, being fault tolerant is one of the major entities in the cloud platform, which tends to minimize the cost per task and further workflow.This research work aims at designing a hybrid faulttolerant mechanism, which reduces the cost as given in Figure 1.The data here is fetched from each user, which is further reviewed and executed.A user here refers to the individual from whom the data is collected.Here the user submits a collection of various types of data.Once it is executed, the output is generated respectively.The application interface (AI) here is responsible to connect the user with the proposed method.The user here feeds the data to the AI for execution.The AI transfers data to the next component of the model, the ƥ model that transmits the data to the workflow model for size, nodes, and edges.The AI serves as an interface to one or more users or for workflows submitted to the proposed model.DAG Workflow here is depicted as ƥ, the data received by the interface for the functioning and evaluation of various applications.For the data generated the ƥ generates a DAG graph.The ƥ functions as the data collected by various workflows produce a reliably synthetic workflow.The DAG graph structure is represented as shown in (1).
G DAG denotes a DAG with features such as S wf = S wf1 , S wf2, S wf3 , … … .S wfn denotes the number of iterations involved and DF = {(S wf(x) , S wf(y) |S wf(x) , S wf(y) ∈ S wf } denotes the DF dependencies in between the iterations.The dependency set S wf(x) , S wf(y) consisting of the parameters in between the iterations S wf(x) and S wf(y) .The step S wf(x) is termed the successor of the iteration S wf(y) .Any step toward the initiation is termed the initiation step and the end step is termed the termination step shown in (2).A person can submit input and  ISSN: 2089-4864 Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 372-382 376 generate an output; the ƥ receives data that contains multiple inputs used by the user.This depicts the data in their subsequent workflows that generate a DAG which is further passed on to ϒ.

Optimal-workflow execution
The executable workflow is denoted as ϒ fed with multiple workflows through the ϒ for each resource.The ϒ rebuilt the workflow to enhance the performance.This integrates each iteration of the workflow into a job into a single iteration henceforth the overhead is minimized.However, after that ϒ the job includes the execution of several iterations.The iterations are arranged following their dependencies.The iterations here are executed in a parallel and integrated sequence.At this point, the parameters for each workflow are provided by the data for the output.Additionally, the ϒ is also responsible to produce the data from various jobs and then estimate the storage resources, from various inputs transferred to the next model.The ϒ transmits all the jobs to the next component.

Resource allocator
The resource allocator is denoted as ϑ allocated multiple jobs through the ϒ and then allocates resources to it.The allocated resources are extracted from the cloud and handled separately through the proposed technique.The ϒ is responsible to allocate the resources; this is done in a manner that the cost is optimized.This is accomplished by a set of iterations allocated to the resources.The resource allocated for each iteration to optimize the period to transfer data.Each iteration allocated all the resources, henceforth the task is allocated to a resource within the optimized time limit.Here res cost(x) depicts the cost of the resource x and DT time denotes the data transfer time for step x for resource x.By assigning the resource to each phase with the least amount of data transmission time possible, the list is traversed entirely.

Resource(res cost(
3.4.Fault tolerant system () Fault tolerant system () is denoted as  executes the iterations allocated by the ϒ which is executed completely, the interface is used to return the results to the user.The ϒ initiates the fault-tolerant approach and executes the failure of subsequent iterations.This model produces results upon successful execution, the  adapts Automate_clustering () based on  once the iteration fails.In this state, the assumption is considered based on a 5% failure rate based on the jobs selected, henceforth the τ() is initiated in each period.The optimal workload balancer here is responsible for cost optimization requiring resource utilization in ascending a payload of nodes that are dispersed to use the available nodes.The resource utilization is done based on three levels i.e Top_peak, bottom_peak, and Mid_peak.
Here bottom_peak, denotes the lowest peak value usage and Top_peak value indicates the highest peak value.The resources here are made available to transfer the workload from resources for the bottom_peak value a null value when the resources with null value are terminated.The data is fed via an AI that is responsible for providing data from two to three workflows to the proposed model.The proposed method is applied to the ϒ which is applied to the data given as input, the ϒ is the next key component.The ϒ converts the workflow model into appropriate resource initialization based on the iterations for the work allocated.This is then transferred to the ϑ that sends the jobs and tasks to the workflow and then transmitted to the ϑ, this system is responsible to schedule the resources based on the scheduling policy.The  uses an AI, receives the corresponding iterations, and returns the outcome to the entity.The cost-aware optimal balancer for initiating effective cost-efficient optimization is launched when the iterations fail to execute.The energy is used based on the ascending order of the resources, which further distributes the workload throughout the nodes for minimal consumption of other nodes.The ϑ receives the jobs through the ϒ and schedules them by allocation of resources to the cloud, ϒ allots the jobs upon conversion to essential tasks.The resources are fetched from the cloud, along with the proposed system to plan and manage the tasks specifically.The ϒ allocates the resources in a way to spend a minimal amount of money for the completion of the task.The task is completed ISSN: 2089-4864  Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud … (Chaya T. Doddaiah) 377 by considering the list of tasks and resources, the mapping list is then generated upon the allocation of efficient resources in the shortest period for each iteration.The integration of automated clustering of the proposed method ensures fault tolerance mechanisms that are related to the iterations involved in a job for workflow are combined as a cluster and execution on a single resource.The uncompleted tasks remaining in the cluster, create clusters automatically depending on the completion of the given job and execute it repeatedly.The ( 6) and ( 7) depict the mechanism of automatic clustering.
Ex (∫ S wf(y) AutoClus (∫ S wf(y) The ( 8) denotes the clusters for n similar tasks and executes them, henceforth the (9) denotes the job that has failed and automatically clusters it for execution purposes.To reduce the workload, the proposed mechanism results in evaluating the resources, arranging them in ascending order, and then automatically clustering and transferring the workload for resource utilization of other resources and arranging them in ascending order that transfers the workload for the resources with the least utilization of other nodes.This categorizes the usage into three segments i.e Top_peak, bottom_peak, and Mid_peak.The workload is transferred from the bottom_peak to the Mid_peak to make bottom_peak resources null and then the resources with null usage are switched off.Algorithm 1 depicts the algorithm designed for minimizing the cost.

RESULT SECTION
In this section, the proposed model is evaluated upon further minimizing the cost optimization.The run-time is evaluated based on allocating jobs at values 30, 50, 100, and 1,000.This is depicted in the form of graphs for the expected outcomes shown by taking into account cost optimization.The proposed model is tested with the CyberShake dataset by using the simulator of cloudsim.The model is deployed against 64-bit Windows 11 operating system for 16 GB RAM that has Intel ®Core™i7 processor.It contains a 3.20 GHz CPU, the proposed model is deployed on EclipseWS Neon.3 editor, and the code is written in JAVA.

Dataset description
The performance of the proposed model is evaluated via the different scientific areas, here CyberShake is used to facilitate the earthquake hazards that are produced by synthetic seismograms and categorized according to a data-centric workflow through by wide memory-range and basic CPU necessities.The custom Montages of the sky are based on the Montage application for the input images considered.The tasks are differentiated by using the I/O intensive applications by not considering the CPU processing activity.Gravitational waves are detected by using the astrophysics LIGO method.It is composed mostly of CPUintensive tasks with high memory requirements.Several genome-sequencing processes are automated using the CPU-intensive epigenomics approach, which is utilized by the bioinformatics industry.In bioinformatics, SIPHT is used to automate the search for sRNA-encoding genes.The majority of SIPHT workloads are CPUheavy but I/O-light.You can review examples of the design of these processes, as well as their full description and characterization.Using the Pegasus workflow generator, five scientific processes from the actual world were generated during experiments (CyberShake, Montage, LIGO, and SIPHT), there are 50, 100, 200, 500, and 1,000 jobs of varying sizes; the suggested model is compared to the existing model, and all prices are estimated in dollars.

Cost evaluation 4.2.1. CyberShake approach
The CyberShake method, which is utilized to generate the seismic hazard probability curve for numerous areas in Southern California, takes an average of 8 hours and 51 minutes to execute. Figure 2 is a depiction of the same topic.In addition, it creates a significant quantity of data and has a high number of jobs; processing such a vast quantity of data demands a tremendous deal of energy, rendering the model inefficient.The suggested model efficiently minimizes energy usage in consideration of ecosystem adoption of practices over time (EAPT).To further evaluate the new model, it is contrasted with the current mechanism.
By considering the four CyberShake method, Table 1 compares the total expenses of the existing system with the proposed system.In addition, the existing model for

Laser interferometer gravitational wave observatory approach
The LIGO method is used to generate and analyze gravitational waves from data collected by merging compact binary models.The LIGO workflow is represented in Figure 3

CONCLUSION
This research work develops HCFT mechanism for cost reduction; HCFT comprises novel clustering and optimal resource utilization mechanism for fault tolerance, which tends to reduce workflow cost.HCFT mechanism is evaluated considering various scientific workflow with its various variant.Furthermore, considering cost as an evaluation parameter, HCFT observes marginal improvisation over the existing model.In case of CyberShake_30, CyberShake_50, CyberShake_100, and CyberShake_1000, HCFT-mechanism observes 23.83%, 25.77%, and 28.26% respectively.Furthermore, considering the Inspiral workflow, HCFTmechanism observes improvisation of 57.704%, 57.713%, 57.692%, and 57.707% for Inspiral_30, Inspiral_50, and Inspiral_1000, respectively.At last, HCFT-mechanism observes improvisation of 57.872%, 57.885%, 57.883%, and 57.891% for SIPHT_30, SIPHT _60, SIPHT_100, and SIPHT _1000.HCFT observes marginal improvisation; however, considering the complexity of scientific workflow, future work should be focused on the implementation of a machine learning approach to predict the resources efficiently.

Figure 1 .
Figure 1.Hybrid fault tolerance workflow Resource usage (Resource(res cost(x) ) = { Mid_peak (Resource usage = 0.4 to 0.8) bottom_peak(Resource usage = 0 to 0.4) Top_peak(Resource usage = 0.8 to 1) CyberShake_30 costs $22,232.6165whereas the proposed model costs $17,497.7745;similarly, the existing model for CyberShake_50 costs $24,669.0295while the proposed model costs $19,036.5655.The existing model for CyberShake_100 requires $30,138.901,but the proposed model only requires $22,674.6285milliseconds.The existing model costs $204,102 to perform the CyberShake_1000 job, whereas the proposed method costs $145,983.8115.Here PS stands for proposed system and ES stands for existing system.

Figure 2 .
Figure 2. Cost comparison for CyberShake method . In addition, this process has four separate DAX files; the present model for Inspiral_30 costs $22,951.4585,but the proposed model costs $12,673.1235.Inspiral_50 and in Inspiral_100 incur additional costs of $22,519.9115 and $40,281.976,respectively, compared to the older models' costs of $40,788.6025 and $72,942.9245.The existing model costs 789,741.133and the proposed model costs 436,056.0895for Inspiral_1000.Table 2 compares the values, and Figure 3 illustrates a graph comparing the existing and proposed values.

Figure 3 .
Figure 3. Graphical comparison of LIGO method

4. 2 . 3 .
Montage approach With the NASA-developed Montage approach, several images are fed as input and are used to create one-of-a-kind mosaics.The Montage technique requires four different dax files for a cost comparison.The existing model forecasts costs of $846.8255 and $1,872.66for Montage_25 and Montage_50, whereas the proposed model, predicts costs of $482.8945 and $1,062.9835forMontage_25 and Montage_50, respectively.

Figure 4
displays the cost comparison on Montage method.Table 3 displays the cost comparison on Montage technique.The existing model for Montage_100 and Montage_1000 costs $3,944.9945and $41,426.565,but the proposed model costs $2,231.8825and $23,398.8515,respectively.

Figure 4 .
Figure 4. Cost comparison on Montage method

Table 1 .
Cost comparison for CyberShake method  Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud … (Chaya T. Doddaiah) 379

Table 2 .
Cost comparison for LIGO method

Table 3 .
Cost comparison on Montage technique

Table 4 .
Cost comparisons of SIPHT method