Dataflow performance with field usage analysis : Rikke

Dataflow performance with field usage analysis
by: Rikke
blow post content copied from  salesforceblogger.com
click here to view original post


Some of you might have long running dataflows that you for many reasons want to have running faster. But how you approach this can be time-consuming. Siva Teja Ghattepally has provided a brilliant webinar giving techniques to optimize the dataflow performance. To aid this process Mohan Chinnappan‘s analytics plugin provides a great command that allows you to:

  1. Get an overview of your dataflow nodes and their performance,
  2. Get an overview of which fields are not used and can be removed,
  3. Gives you the full insight into a node without clicks unlike the actual dataflow.

In this blog, I will walk through how to use this ‘analyze’ command. Please note that you will need to have Salesforce CLI and Mohan’s plugin to leverage this blog. Please check out this blog for details on how to install or update the plugin.

Note: this blog is using the following version sfdx-mohanc-plugins 0.0.119. To see the latest details around the command check out github.

The dataflow job analyze command

The main command for this blog is the dataflow job analyze command. Let’s have a look at the options for the command by using the following:

sfdx mohanc:ea:dataflow:jobs:analyze -h

Let’s have a closer look at the options for this command.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:analyze -u <insert username>

--Example
sfdx mohanc:ea:dataflow:jobs:analyze -u [email protected]

Dataflow job id

Use the -j option to specify a dataflow job id to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:analyze -u <insert username> -j <insert dataflow job id>

--Example
sfdx mohanc:ea:dataflow:jobs:analyze -u [email protected] -j 03CB000000383oAMAQ

Dataflow id

Use the -d option to specify a dataflow id to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:analyze -u <insert username> -j <insert dataflow job id> -d <insert dataflow id>

--Example
sfdx mohanc:ea:dataflow:jobs:analyze -u [email protected] -j 03CB000000383oAMAQ -d 02K3h000000MtyuEAC

The dataflow job list command

To use the dataflow job analyze command we need to have a dataflow job id, which we can get by using the dataflow job list command. To get the option for this command enter the following:

sfdx mohanc:ea:dataflow:jobs:list -h

Let’s have a closer look on the option for this command.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:list -u <insert username>

--Example
sfdx mohanc:ea:dataflow:jobs:list -u [email protected]

The dataflow list command

To use the dataflow job analyze command we need to have a dataflow id, which we can get by using the dataflow list command. To get the option for this command enter the following:

sfdx mohanc:ea:dataflow:list -h

Let’s have a closer look on the option for this command.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataflow:list -u <insert username>

--Example
sfdx mohanc:ea:dataflow:list -u [email protected]

Analyze the dataflow job

Having looked at the dataflow job analyze command and the additional commands we need to get the dataflow id and dataflow job id, let’s have a look at steps to get a deeper look at how a given dataflow performs.

Note: Before using the load command you would have to log in to the desired org by using the command sfdx force:auth:web:loginwhich will launch the login window in a browser.

Step 1 – use the dataflow:jobs:list command to extract the list of jobs run in the org.

sfdx mohanc:ea:dataflow:jobs:list

Step 2 – define the username for the target org by adding the -u option.

sfdx mohanc:ea:dataflow:jobs:list -u [email protected]

Step 3 – press enter.

Step 4 – find the dataflow run you want to analyze and copy the id. I am saving the id in a text editor. Note that you see dataflow and data sync in the list, so it may be a long list. Essentially this list is identical to what you see in the Data Monitor in the Data Manager.

Step 5 – use the dataflow:list command to extract the list of dataflows in the org.

sfdx mohanc:ea:dataflow:list

Step 6 – define the username for the target org by adding the -u option.

sfdx mohanc:ea:dataflow:list -u [email protected]

Step 7 – press enter.

Step 8 – find your the dataflow in question, copy the id and save just as you did for the dataflow job id.

Step 9 – use the dataflow:jobs:analyze command to analyze a specific job and dataflow.

sfdx mohanc:ea:dataflow:jobs:analyze

Step 10 – define the username for the target org by adding the -u option.

sfdx mohanc:ea:dataflow:jobs:analyze -u [email protected]

Step 11 – define the dataflow job id from previously using the -j option.

sfdx mohanc:ea:dataflow:jobs:analyze -u [email protected] -j 03CB000000383oAMAQ

Step 12 – define the dataflow id from previously using the -d option.

sfdx mohanc:ea:dataflow:jobs:analyze -u [email protected] -j 03CB000000383oAMAQ -d 02KB0000000BRisMAG

Step 13 – press enter.

Once the command is done you will see three files being generated:

  1. A JSON file with the dataflow id as name
  2. A CSV file with the dataflow id as name – this is the same file you get when using the dataflow:fieldUsage command, which you can read more about in this blog.
  3. A SVG file with the dataflow job id as a name.

Step 14 – locate the SVG file generated from the command on your computer and open it in your browser

The SVG file will show:

  • Each node from the dataflow,
  • the action type for each node,
  • the parameters for each node,
  • the duration it took to run each node
  • the input and output rows from each node
  • digest nodes will show the object used
  • computeExpression nodes will show the SAQL expression
  • register nodes will show the fields and the usage count across lenses and dashboards – highlighting the fields that are not used.

Below you can see some examples of how nodes are represented.

You can use the SVG file to analyze how each node is performing as well as see where some fields might be removed because there are not being used. Do remember that date components are automatically created and cannot be removed unless the date field isn’t used at all. Finally, giving the visual representation with timings can aid you in investigating which nodes to focus on in trying to optimize the performance. For more details on dataflow optimization please check out the learning days recorded webinar on the subject.


November 18, 2020 at 11:05PM
Click here for more details...

=============================
The original post is available in salesforceblogger.com by Rikke
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce