Here we have mentioned most frequently asked DataStage Interview Questions and Answers specially for freshers and experienced.


 

1. Define Data Stage?

Ans:

A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extracts data from databases and change them into data warehouses. It has become an essential part of IBM WebSphere Data Integration suite.

2. Explain how a source file is populated?

Ans:

We can populate a source file in many ways such as by creating a SQL query in Oracle, or by using row generator extract tool etc.

3. Name the command line functions to import and export the DS jobs?

Ans:

To import the DS jobs, dsimport.exe is used and to export the DS jobs, dsexport.exe is used.

4. What is the difference between Datastage 7.5 and 7.0?

Ans:

In Datastage 7.5 many new stages are added for more robustness and smooth performance, such as Procedure Stage, Command Stage, Generate Report etc.

5. In Datastage, how you can fix the truncated data error?

Ans:

The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘ IMPORT_REJECT_STRING_FIELD_OVERRUN’.

6. Define Merge?

Ans:

Merge means to join two or more tables. The two tables are joined on the basis of Primary key columns in both the tables.

7. Differentiate between data file and descriptor file?

Ans:

As the name implies, data files contains the data and the descriptor file contains the description/information about the data in the data files.

8. Differentiate between datastage and informatica?

Ans:

Datastage
In datastage, there is a concept of partition, parallelism for node configuration. While, there is no concept of partition and parallelism in informatica for node configuration. Also, Informatica is more scalable than Datastage. Datastage is more user-friendly as compared to Informatica.

9. Define Routines and their types?

Ans:

Routines are basically collection of functions that is defined by DS manager. It can be called via transformer stage. There are three types of routines such as, parallel routines, main frame routines and server routines.

10. How can you write parallel routines in datastage PX?

Ans:

We can write parallel routines in C or C++ compiler. Such routines are also created in DS manager and can be called from transformer stage.



 

11. What is the method of removing duplicates, without the remove duplicate stage?

Ans:

Duplicates can be removed by using Sort stage. We can use the option, as allow duplicate = false.

12. What steps should be taken to improve Datastage jobs?

Ans:

In order to improve performance of Datastage jobs, we have to first establish the baselines. Secondly, we should not use only one flow for performance testing. Thirdly, we should work in increment. Then, we should evaluate data skews. Then we should isolate and solve the problems, one by one. After that, we should distribute the file systems to remove bottlenecks, if any. Also, we should not include RDBMS in start of testing phase. Last but not the least, we should understand and assess the available tuning knobs.

13. Differentiate between Join, Merge and Lookup stage?

Ans:

All the three concepts are different from each other in the way they use the memory storage, compare input requirements and how they treat various records. Join and Merge needs less memory as compared to the Lookup stage.

14. Explain Quality stage?

Ans:

Quality stage is also known as Integrity stage. It assists in integrating different types of data from various sources.

15. Define Job control?

Ans:

Job control can be best performed by using Job Control Language (JCL). This tool is used to execute multiple jobs simultaneously, without using any kind of loop.

16. Differentiate between Symmetric Multiprocessing and Massive Parallel Processing?

Ans:

In Symmetric Multiprocessing, the hardware resources are shared by processor. The processor has one operating system and it communicates through shared memory. While in Massive Parallel processing, the processor access the hardware resources exclusively. This type of processing is also known as Shared Nothing, since nothing is shared in this. It is faster than the Symmetric Multiprocessing.

17. What are the steps required to kill the job in Datastage?

Ans:

To kill the job in Datasatge, we have to kill the respective processing ID.

18. Differentiate between validated and Compiled in the Datastage?

Ans:

In Datastage, validating a job means, executing a job. While validating, the Datastage engine verifies whether all the required properties are provided or not. In other case, while compiling a job, the Datastage engine verifies that whether all the given properties are valid or not.

19. How to manage date conversion in Datastage?

Ans:

We can use date conversion function for this purpose i.e. Oconv(Iconv(Filedname,”Existing Date Format”),”Another Date Format”).

20. Why do we use exception activity in Datastage?

Ans:

All the stages after the exception activity in Datastage are executed in case of any unknown error occurs while executing the job sequencer.




 

21. Define APT_CONFIG in Datastage?

Ans:

It is the environment variable that is used to identify the *.apt file in Datastage. It is also used to store the node information, disk storage information and scratch information.

22. Name the different types of Lookups in Datastage?

Ans:

There are two types of Lookups in Datastage i.e. Normal lkp and Sparse lkp. In Normal lkp, the data is saved in the memory first and then the lookup is performed. In Sparse lkp, the data is directly saved in the database. Therefore, the Sparse lkp is faster than the Normal lkp.

23. How a server job can be converted to a parallel job?

Ans:

We can convert a server job in to a parallel job by using IPC stage and Link Collector.

24. Define Repository tables in Datastage?

Ans:

In Datastage, the Repository is another name for a data warehouse. It can be centralized as well as distributed.

25. Define OConv () and IConv () functions in Datastage?

Ans:

In Datastage, OConv () and IConv() functions are used to convert formats from one format to another i.e. conversions of roman numbers, time, date, radix, numeral ASCII etc. IConv () is basically used to convert formats for system to understand. While, OConv () is used to convert formats for users to understand.

26. Explain Usage Analysis in Datastage?

Ans:

In Datastage, Usage Analysis is performed within few clicks. Launch Datastage Manager and right click the job. Then, select Usage Analysis and that’s it.

27. How do you find the number of rows in a sequential file?

Ans:

To find rows in sequential file, we can use the System variable @INROWNUM.

28. Differentiate between Hash file and Sequential file?

Ans:

The only difference between the Hash file and Sequential file is that the Hash file saves data on hash algorithm and on a hash key value, while sequential file doesn’t have any key value to save the data. Basis on this hash key feature, searching in Hash file is faster than in sequential file.

29. How to clean the Datastage repository?

Ans:

We can clean the Datastage repository by using the Clean Up Resources functionality in the Datastage Manager.

30. How a routine is called in Datastage job?

Ans:

In Datastage, routines are of two types i.e. Before Sub Routines and After Sub Routines. We can call a routine from the transformer stage in Datastage.


 

31. Differentiate between Operational Datastage (ODS) and Data warehouse?

Ans:

We can say, ODS is a mini data warehouse. An ODS doesn’t contain information for more than 1 year while a data warehouse contains detailed information regarding the entire business.

32. NLS stands for what in Datastage?

Ans:

NLS means National Language Support. It can be used to incorporate other languages such as French, German, and Spanish etc. in the data, required for processing by data warehouse. These languages have same scripts as English language.

33. Can you explain how could anyone drop the index before loading the data in target in Datastage?

Ans:

In Datastage, we can drop the index before loading the data in target by using the Direct Load functionality of SQL Loaded Utility.

34. Does Datastage support slowly changing dimensions ?

Ans:

Yes. Version 8.5 + supports this feature

35. How can one find bugs in job sequence?

Ans:

We can find bugs in job sequence by using DataStage Director.

36. How complex jobs are implemented in Datstage to improve performance?

Ans:

In order to improve performance in Datastage, it is recommended, not to use more than 20 stages in every job. If you need to use more than 20 stages then it is better to use another job for those stages.

37. Name the third party tools that can be used in Datastage?

Ans:

The third party tools that can be used in Datastage, are Autosys, TNG and Event Co-ordinator. I have worked with these tools and possess hands on experience of working with these third party tools.

38. Define Project in Datastage?

Ans:

Whenever we launch the Datastage client, we are asked to connect to a Datastage project. A Datastage project contains Datastage jobs, built-in components and Datastage Designer or User-Defined components.

39. How many types of hash files are there?

Ans:

There are two types of hash files in DataStage i.e. Static Hash File and Dynamic Hash File. The static hash file is used when limited amount of data is to be loaded in the target database. The dynamic hash file is used when we don’t know the amount of data from the source file.

40. Define Meta Stage?

Ans:

In Datastage, MetaStage is used to save metadata that is helpful for data lineage and data analysis.



 

41. Have you have ever worked in UNIX environment and why it is useful in Datastage?

Ans:

Yes, I have worked in UNIX environment. This knowledge is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to invoke batch processing etc.

42. Differentiate between Datastage and Datastage TX?

Ans:

Datastage is a tool from ETL (Extract, Transform and Load) and Datastage TX is a tool from EAI (Enterprise Application Integration).

43. What is size of a transaction and an array means in a Datastage?

Ans:

Transaction size means the number of row written before committing the records in a table. An array size means the number of rows written/read to or from the table respectively.

44. How many types of views are there in a Datastage Director?

Ans:

There are three types of views in a Datastage Director i.e. Job View, Log View and Status View.

45. Why we use surrogate key?

Ans:

In Datastage, we use Surrogate Key instead of unique key. Surrogate key is mostly used for retrieving data faster. It uses Index to perform the retrieval operation.

46. How rejected rows are managed in Datastage?

Ans:

In the Datastage, the rejected rows are managed through constraints in transformer. We can either place the rejected rows in the properties of a transformer or we can create a temporary storage for rejected rows with the help of REJECTED command.

47. Differentiate between ODBC and DRS stage?

Ans:

DRS stage is faster than the ODBC stage because it uses native databases for connectivity.

48. Define Orabulk and BCP stages?

Ans:

Orabulk stage is used to load large amount of data in one target table of Oracle database. The BCP stage is used to load large amount of data in one target table of Microsoft SQL Server.

49. Define DS Designer?

Ans:

The DS Designer is used to design work area and add various links to it.

50. Why do we use Link Partitioner and Link Collector in Datastage?

Ans:

In Datastage, Link Partitioner is used to divide data into different parts through certain partitioning methods. Link Collector is used to gather data from various partitions/segments to a single data and save it in the target table.




 

51. What is Datastage?

Ans:

Datastage is an ETL tool given by IBM which utilizes a GUI to design data integration solutions. This was the first ETL tool that gave parallelism concept.

It is available in following 3 different editions

  • Server Edition
  • Enterprise Edition
  • MVS Edition

52. Highlight the main features of Datastage?

Ans:

The main features of Datastage are highlighted below

  • It is the data integration component of IBM Infosphere information server.
  • It is a GUI based tool.We just need to drag and drop the Datastage objects and we can convert it to Datastage code.
  • It is used to perform the ETL operations (Extract, transform, load)
  • It provides connectivity to multiple sources & multiple targets at the same time
  • Provides partitioning and parallel processing techniques which enable the Datastage jobs to process a huge volume of data quite faster.
  • It has enterprise-level connectivity.

53. What are the primary usages of Datastage tool?

Ans:

Datastage is an ETL tool which is primarily used for extracting data from source systems, transforming that data and finally loading it to target systems.

54. What are the main differences you have observed between 7.x and 8.x version of DataStage?

Ans:

Here are the main differences between both the versions

7.x

8.x

7.x version was platform dependent

This version is platform independent

It has 2-tier architecture where datastage is built on top of Unix server

It has 3-tier architecture where we have UNIX server database at the bottom then XMETA database which acts as a repositorty and then we have datastage on top.

There is no concept of parameter set

We have parameter sets which can be used anywhere in the project.

We had designer and manager as two separate clients

In this version, the manager client was merged into designer client

We had to manually search for the jobs in this version

Here we have quick find option in the repository where we can search easily for the jobs.

55. Can you highlight the main features of IBM Infosphere information server?

Ans:

Below are the main features of IBM Infosphere information server suite

  • It provides a single platform for data integration.It has the capability to connect to multiple source systems as well as write to multiple target systems.
  • It is based on centralized layers.All the components of the suite are able to share the baseline architecture of the suite.
  • It has layers for the unified repository, for integrated metadata services and common parallel engine.
  • It provides tools for analysis, cleansing, monitoring, transforming and delivering data.
  • It has massively parallel processing capabilities.It turns out the processing to be very fast.

56. What are the different layers in the information server architecture?

Ans:

Below are the different layers of information server architecture

  • Unified user interface
  • Common services
  • Unified parallel processing
  • Unified Metadata
  • Common connectivity

57. What could be a data source system?

Ans:

It could be a database table, a flat file, or even an external application like people soft.

58. On which interface you will be working as a developer?

Ans:

As a Datastage developer, we work on Datastage client interface which is known as a Datastage designer that needs to be installed on the local system. In the backend, it is connected to the Datastage server.

59. What are the different common services in Datastage?

Ans:

Below is the list of common services in Datastage

  • Metadata services
  • Unified service deployment
  • Security services
  • Looping and reporting services.

60. How do you start developing a Datastage project?

Ans:

The very first step is to create a Datastage job on the Datastage server. All the Datastage objects that we create are stored in the Datastage project. A Datastage project is separated environment on the server for jobs, tables, definitions, and routines.
A Datastage project is separated environment on the server for jobs, tables, definitions, and routines.


 

61. What is a DataStage job?

Ans:

The Datastage job is simply a DataStage code that we create as a developer. It contains different stages linked together to define data and process flow.
Stages are nothing but the functionalities that get implemented.
For example: Let’s assume that I want to do a sum of the sales amount. This can be a ‘group by’ operation that will be performed by one stage.
Now, I want to write the result to a target file. So, this operation will be performed by another stage. Once, I have defined both the stages, I need to define the data flow from my ‘group by’ stage to the target file stage. This data flow is defined by DataStage links.
Once, I have defined both the stages, I need to define the data flow from my ‘group by’ stage to the target file stage. This data flow is defined by DataStage links.

62. What are DataStage sequences?

Ans:

Datastage sequence connects the DataStage jobs in a logical flow.

63. If you want to use a same piece of code in different jobs, how will you achieve this?

Ans:

This can be done by using shared containers. We have shared containers for reusability. A shared container is a reusable job element consisting of stages and links. We can call a shared container in different Datastage jobs.

64. Where do the Datastage jobs get stored?

Ans:

The Datastage jobs get stored in the repository. We have various folders in which we can store the Datastage jobs.

65. Where do you see different stages in the designer?

Ans:

All the stages are available within a window called as ‘Palette’. It has various categories depending upon the kind of function that the stage provides.
The various categories of stages in the Palette are – General, Data Quality, Database, Development, File, Processing, etc.

66. What are the Processing stages?

Ans:

The processing stages allow us to apply the actual data transformation.
For example, The ‘aggregator’ stage under Processing category allows us to apply all the ‘group by’ operations. Similarly, we have other stages in Processing like the ‘Join’ stage that allows us to join together the data coming from two different input streams.

67. What are the steps needed to create a simple basic Datastage job?

Ans:

Click on the File -> Click on New -> Select Parallel Job and hit Ok. A parallel job window will open up. In this Parallel job, we can put together different stages and define the data flow between them. The simplest DataStage job is an ETL job.
In this, we first need to extract the data from the source system for which we can use either a file stage or database stage because my source system can either be a database table or a file.
Suppose we are reading data from a text file. In this case, we will drag and drop the ‘Sequential File’ stage to the parallel job window. Now, we need to perform some transformation on top of this data.
We will use ‘Transformer’ stage which is available under the Processing category. We can write any logic under the Transformer stage.
Finally, we need to load the processed data to some target table. Let’s say my target database is DB2. So, for this, we will select DB2 connector stage. Then we will be connecting these data states through sequential links.
After this, we need to configure the stages so that they point to the correct filesystem or database.
For example: For the Sequential file stage, we need to define the mandatory parameters like the file name, file location, column metadata.
Then we need to compile the Datastage job. Compiling the job checks for the syntax of the job and creates an executable file for the Datastage job that can be executed at run time.

68. Name the different sorting methods in Datastage.

Ans:

There are two methods available

  • Link sort
  • Inbuilt Datastage Sort

69. In a batch if a job fails in between and you want to restart the batch from that particular job and not from the scratch then what will you do?

Ans:

In Datastage, there is an option in job sequence – ‘Add checkpoints so the sequence is restart-able on failure’. If this option is checked, then we can rerun the job sequence from the point where it failed.

70. How do you import and export the Datastage jobs?

Ans:

We have below command line functions for this

  • Import: dsimport.exe
  • Export: dsexport.exe


 

71. What are routines in Datastage? Enlist various types of routines.

Ans:

Routine is a set of functions that are defined by DS manager. It is run via transformer stage.
There are 3 kinds of routines

  • Parallel routines
  • Mainframe routines
  • Server routines

72. How do you remove duplicate values in DataStage?

Ans:

There are two ways to handle duplicate values

  • We can use remove duplicate stage to eliminate duplicates.
  • We can use Sort stage to remove duplicate. The sort stage has a property called ‘allow duplicates’. We won’t get duplicate values in the output of sort on setting this property equal to false.

73. What are the different kinds of views available in a Datastage director?

Ans:

There are 3 kinds of views available in Datastage director

  • Log view
  • Status view
  • Job view

74. Distinguish between Informatica & Datastage. Which one would you choose and why?

Ans:

Both Informatica and DataStage are powerful ETL tools.

Below are some points differentiating between both the tools:

 

Informatica

Datastage

Parallel Processing

Informatica does not support parallel processing.

In contrast to this, datastage provides mechanism for parallel processing.

Implementing SCDs

It is quite simple to implement SCDs (Slowly changing dimensions) in Informatica.

However, it is complex to implement SCDs in datastage. Datastage supports SCDs merely through custom scripts.

Version Control

Informatica supports version controlling through check-in and check-out of objects.

However, we don’t have this functionality available in datastage.

Available Transformations

Lesser transformations are available.

Datastage offers more variety of transformations than Informatica.

Power of lookup

Informatica provides very powerful dynamic cache lookup

We don’t have any similar thing in datastage.

In my personal opinion, I would go with Informatica over Datastage. The reason being I have found Informatica more systematic and user-friendly than DataStage.

Another strong reason is that debugging and error handling is much better in Informatica as compared to Datastage. So, fixing issues become easier in Informatica. Datastage does not provide complete error handling support.

75. Give an idea of system variables.

Ans:

System variables are the read-only variables beginning with ‘@’ which can be read by either transformer stage or routine. They are used to get the system information.

76. What is the difference between passive stage and active stage?

Ans:

Passive stages are utilized for extraction and loading whereas active stages are utilized for transformation.

77. What are the various kinds of containers available in Datastage?

Ans:

We have below 2 containers in Datastage

  • Local container
  • Shared container

78. Is the value of staging variable stored temporarily or permanently?

Ans:

Temporarily. It is a temporary variable.

79. What are the different type of jobs in Datastage?

Ans:

We have two types of jobs in Datastage

  • Server jobs (They run in a sequential manner)
  • Parallel jobs (They get executed in a parallel way)

80. What is the use of Datastage director?

Ans:

Through Datastage director, we can schedule a job, validate the job, execute the job and monitor the job.




 

81. What are the various kinds of the hash file?

Ans:

We have 2 type of hash files

  • Static hash file
  • Dynamic hash file

82. What is a quality stage?

Ans:

Quality stage (also called as integrity stage) is a stage that aids in combining the data together coming from different sources.

83. DataStage Characteristics

Ans:

Criteria

Result

Support for Big Data Hadoop

Access Big Data on a distributed file system, JSON support & JDBC integrator

Ease of use

Improve speed, flexibility, & efficacy for data integration

Deployment

On-premise or cloud as the need dictates

84. Explain What is IBM DataStage?

Ans:

Datastage is an extract, transform and load tool that is part of the IBM Infosphere suite. It is a tool that is used for working with large data warehouses and data marts for creating and maintaining such a data repository.

85. How is a DataStage source file filled?

Ans:

We can develop an SQL query or we can use a row generator extract tool through which we can fill the source file in Data Stage.

86. How is merging done in DataStage?

Ans:

Merging is done when two or more tables are expected to be combined based on their primary key column. This is the basis for merging in Data Stage.

87. What is a data file and a descriptor file?

Ans:

Both these files are as the name indicates are serving different purpose in Data Stage. The descriptor files contain all the information or description while the data file is the one that just contains the data.

88. How is DataStage different from Informatica?

Ans:

Data stage and Informatica are both powerful ETL tools but there are a few difference between the two tools. Data stage has the parallelism and partition concept for node configuration whereas the Informatica tool there is not support for parallelism in node configuration. Data stage is simpler to use as compared to Informatica.

89. What is a Routine in Data Stage?

Ans:

The DataStage manager defines a collection of functions within this tool which is called as a Routine. There are basically there types of Routines in DataStage namely Job Control Routine, Before/After Sub-routine, Transform Function.

90. What is the process for removing duplicates in DataStage?

Ans:

The duplicates within the data stage can be removed using the sort function. While running the sort function you need specify for the option which allows for duplicates by setting it to false


 

91. What is the difference between Join, Merge & Lookup stage?

Ans:

The fundamental difference between these three stages is the amount of memory they take. Other than that how they treat the input requirement and the various records is also a differentiating factor. So based on memory usage, the Lookup stage uses a very less amount of memory. Both Lookup and Merge use a huge amounts of memory.

92. What is the quality state in DataStage?

Ans:

The quality state is used for cleansing the data with the DataStage tool. It is a client server software tool that is provided as part of the IBM Information server.

93. What is job control in DataStage?

Ans:

This tool is used for control the job or executing multiple jobs in a parallel manner. It is deployed using the Job Control Language within the IBM data stage tool.

94. How to do DataStage jobs performance tuning?

Ans:

First you have to select the right configuration files. Then you need to select the right partition and buffer memory. You have to handle the sorting of data and handling null time values. Try to use the modify, copy or filter instead of the transformer. Reduce the propagation of unnecessary metadata between the various stages.

95. What is a repository table in DataStage?

Ans:

The repository is another name for a data warehouse. It can be centralized or a distributed one. The repository table is used for answering the queries like ad hoc, historical, analytical or complex queries.

96. Compare the massive parallel processing and symmetric multiprocessing?

Ans:

In the process of massive parallel processing many of the computers are present in the same chasis. While in the symmetric multiprocessing there are many processors that a share the same hardware resources. The massive parallel processing is called as shared nothing as there is no aspect between the various computers. On the other hand the massive parallel processing is faster than the symmetric multiprocessing.

97. How can you kill the DataStage job?

Ans:

To kill a DataStage job you need to first kill the individual processing ID so that this ensures that the DataStage is killed.

98. How do you compare the validated OK and Compiled processes in DataStage?

Ans:

The Compiled step ensures tha the important stage parameters are mapped and these are correct so this creates an executable job. Whereas in the Validated OK we make sure that the connections are valid.

99. Explain the feature of data type conversion in DataStage?

Ans:

If you want to do data conversion in DataStage then you can use the data conversion function. For this to be successfully executed you need to ensure that the input or the output to and from the operator is the same and the record schema needs to be compatible with the operator.

100. What is the significance of the exception activity in DataStage?

Ans:

Whenever there is an unfamiliar error that is happening when we are executing the job sequencer, during this time all the stages after the exception activity are run. So this makes the exception activity so important in the DataStage.



 

101. What are the various types of Lookups in DataStage?

Ans:

There are different types of Lookups in DataStage. These include the Normal, Sparse, Range and Caseless Lookup in DataStage.

102. When do you use a parallel job and a server job?

Ans:

Using the parallel job or a server job depends on the processing need, functionality, time to implement and the cost. The server job usually runs on a single node, it executes on a DataStage Server Engine and handles small volumes of data. The Parallel job runs on multiple nodes, it executes on a DataStage Parallel Engine and handles large volumes of data.

103. What is usage analysis in DataStage?

Ans:

If you want to whether a certain job is part of the sequence then you right click in the Manager on the job and then choose the Usage Analysis.

104. How to find in a sequential file the number of rows?

Ans:

For counting the number of rows, we should use the @INROWNUM variable.

105. What is the difference between a Sequential file and a Hash file?

Ans:

The Hash file is based on a Hash algorithm and it can used with a key value. The sequential file on the other hand does not have any key column value. The Hash file can be used as a reference for a Look Up while a Sequential file cannot be used for Look Up. Due to the presence fo the Hash key, the Hash file is easier to search than a Sequential file.

106. How do you clean a DataStage Repository?

Ans:

For cleaning a DataStage Repository you need to go to the DataStage Manager and go to the Job in the Menu bar and go to the Clean Up Resources. If you want to further remove the logs then you need to go to the respective job and clean the log files.

107. How do you call a Routine in DataStage?

Ans:

The Routines are stored in the Routine branch of the DataStage Repository. This is where you can create, view or edit all the Routines. The Routines in DataStage could be among the following: Job Control Routine, Before-after Sub-Routine, Transform Function.

108. What is the difference between an Operational DataStage and a Data Warehouse?

Ans:

An Operational DataStage can be considered as a staging area, for real-time analysis, for user processing. Thus it is a temporary repository. Whereas the data warehouse is used for long-term data storage needs and has the complete data of the entire business.

109. What does NLS mean in DataStage?

Ans:

NLS means National Language Support in DataStage. This means you can use this IBM DataStage tool in various languages like multi-byte character languages like Chinese or Japanese too. You can read and write in any language and process it as per the requirement.

110. What are the main features of datastage?

Ans:

  • DataStage has the following features to aid the design and processing required to build a data warehouse :
  • Uses graphical design tools. With simple point and click techniques you can draw a scheme to represent your processing requirements.
  • Extracts data from any number or types of database.
  • Handles all the metadata definitions required to define your data warehouse.
  • You can view and modify the table definitions at any point during the design of your application.
  • Aggregates data.
  • You can modify SQL SELECT statements used to extract data.
  • Transforms data. DataStage has a set of predefined transforms and functions. you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
  • Loads the data warehouse.



 

111. What are Stage Variables, Derivations and Constants?

Ans:

Stage Variable: An intermediate processing variable that retains value during read and doesn’t pass the value into target column.
Derivation: Expression that specifies value to be passed on to the target column.
Constant: Conditions that are either true or false that specifies flow of data with a link.

112. Types of views in Datastage Director?

Ans:

There are 3 types of views in Datastage Director

  • Job View – Dates of Jobs Compiled.
  • Log View – Status of Job last run
  • Status View – Warning Messages, Event Messages, Program Generated Messages.

113. How do you execute datastage job from command line prompt?

Ans:

Using “dsjob” command as follows.
dsjob -run -jobstatus projectname jobname

114. Functionality of Link Partitioner and Link Collector?

Ans:

Link Partitioner:
It actually splits data into various partitions or data flows using various partition methods.
Link Collector:
It collects the data coming from partitions, merges it into a single data flow and loads to target.

115. What are the types of jobs available in datastage?

Ans:

  • Server Job
  • Parallel Job
  • Sequencer Job
  • Container Job

116. What is the difference between Server Job and Parallel Jobs?

Ans:

  • Server jobs were doesn’t support the partitioning techniques but parallel jobs support the partition techniques.
  • Server jobs are not support SMTP,MPP but parallel supports SMTP,MPP.
  • Server jobs are running in single node but parallel jobs are running in multiple nodes.
  • Server jobs prefer while getting source data is low but data is huge then prefer the parallel

117. What is a project?

Ans:

Datastage Projects – A Complete project may contain several jobs and user-defined components.

  • Project Contain Datastage jobs.
  • Built-in components. These are predefined components used in job.
  • User-defined components. These are customized components created using the Datastage manager. each user-defined component performs a specific task in a job.
  • All work done in project. Projects are created during and after installation process.you can add project on the Projects tab of Administrator.
  • A project is associated with a directory.The Project directory is used by DataStage to store jobs and other datastage objects and metedata.

118. What is sequencer?

Ans:

Graphically create controlling job, without using the job control function.

119. What are mainframe jobs?

Ans:

A Mainframe job is complied and run on the mainframe , Data Extracted by such jobs is then loaded into the datawarehouse.

120. What are parallel jobs?

Ans:

These are compiled and run on the DataStage server in a similar way to server jobs , but support parallel processing on SMP,MPP and cluster systems.


 

121. How do you use procedure in datastage job?

Ans:

Use ODBC plug,pass one dummy colomn and give procedure name in SQL tab.

122. What is odbc stage?

Ans:

A Stage that extracts data from or loads data into a database that implements the industry standard Open Database Connectivity API. Used to represent a data source , an aggregation step , or target data table ( Server Job Only )

123. What is hash file ? what are its types?

Ans:

Hash file is just like indexed sequential file , this file internally indexed with a particular key value. There are two type of hash file Static Hash File and Dynamic Hash File.

124. What type of hash file is to be used in general in datastage jobs?

Ans:

Static Hash File.

125. What is a stage variable?

Ans:

In Datastage transformation , we can define some variable and define Value from source.

126. What are constraints and derivations?

Ans:

We can create constraints and derivations with datastage variable.

127. How do you reject records in a transformer?

Ans:

Through datastage constraint we can reject record.

128. Why do you need stage variables?

Ans:

That is depend upon job requirement , through stage variable we can file data.

129. What is the precedence of stage variables,derivations, and constraints?

Ans:

stage variables =>constraints=> derivations

130. What are data elements?

Ans:

A specification that describes the type of data in a column and how the data is converted.



 

131. What are routines ?

Ans:

In Datastage routine is just like function , which we call in datastage job. there are In-Built routine and and also we can create routine.

132. What are transforms and what is the differenece between routines and transforms?

Ans:

Transforms is used to manipulate data within datastage job.

133. What a datastage macro?

Ans:

In datastage macro can be used in expressions , job control routines and before / after subroutines. The available macros are concerned with ascertaining job status.

134. What is job control?

Ans:

A job control routine provides the means of controlling other jobs from the current job. A set of one or more jobs can be validated, run ,reset , stopped , scheduled in much the same way as the current job can be.

135. How many types of stage?

Ans:

There are three basic type of stage

  • Built-in stages – Supplied with DataStage and used for extracting , aggregating , transforming , or writing data. All type of job have these stage.
  • Plug-in stage – Additional stages that can be installed in DataStage to perform specialized tasks that the built-in stages do not support. Server jobs and parallel jobs can make use of these.
  • Job Sequence Stages – Special built-in stages which allow you to define sequences of activities to run. Only job sequencer have these.

136. Difference Between ORAOCI8 and Orabulk?

Ans:

ORAOCI8 – This Stage allow to connect Oracle Database.
OraBulk – The Orabulk plug-in generates control and data files for bulk loading into a single table on an Oracle target database. The files are suitable for loading into the target database using the Oracle command sqlldr.

137. What is Sort plugin?

Ans:

A mainframe processing stage that sorts input columns

138. What is Aggregate stage?

Ans:

A stage type that compute s totals or other functions of set of data.

139. What is the hash file stage and Sequential file stage?

Ans:

A stage that extracts data or load data into database that contain hashed file.

140. What types of flat files you used.have you used tab delimited?

Ans:

Sequential flat file with comma separated.




 

141. What is the Job control code?

Ans:

Job control code used in job control routine to creating controlling job, which invokes and run other jobs.

142. Define the difference between active and Passive Stage?

Ans:

There are two kinds of stages:

  • Passive stages define read and write access to data sources and repositories.
    • Sequential
    • ODBC
    • Hashed
  • Active stages define how data is filtered and transformed.
    • Transformer
    • Aggregator
    • Sort plug-in

143. What are the plugin stages used in your projects?

Ans:

Plug-In Stage – A Stage to perform specific processing that is not supported by the standard server job stage.
Used Plug-in – ORAOCI8, Orabulk.

144. What is a container?

Ans:

A group of stages and link in a job design is called container.
There are two kinds of Containers: Local Container And Shared Container.

  • Local Container:
    Local Containers only exist within the single job they are used. Use Shared Containers to simplify complex job designs.
  • Shared Container:
    Shared Containers exist outside of any specific job. They are listed in the Shared Containers branch is Manager. These Shared Containers can be added to any job. Shared containers are frequently used to share a commonly used set of job components.
    A Job Container contains two unique stages. The Container Input stage is used to pass data into the Container. The Container Output stage is used to pass data out of the Container.