Merging several files from one folder using Power Query

This post will cover how to merge several files from a folder into one file. Merging files is a frequent task for me and the process that used to be a long and tedious repetition of copy-paste and vlookup, it is now a matter of pushing refresh.

Post1_2017_4FilesTo1File
4 files merged into 1 final one.

 

  • Trademarks Master: this file holds most of the data; all the trademark’s core data such as application number, date, country, agent, owner, etc.
  • Trademarks Goods: this file holds the classes and specifications (the details of goods and services the trademark is registered in).
  • Trademarks Prior Registration: this file is usually much smaller than the others as it holds details of any prior registration number and any seniority – priorities are held in the master document.
  • Trademarks Actions: this is by far the largest file, it holds all the actions registered against each TM (merely registering a trademark is generates 3 to 5 actions – a renewal date, some reminders and a declaration of use for any countries where such formalities are required).

Clean-up

The first task for me is to clean each file. I have created 4 functions in order to handle the clean-up quickly and effectively. The main operations performed when cleaning the files are:

  • Filtering records to only keep active records (trademarks with an active status).

Table.SelectRows(PreviousStep, each ([Active] =”True”))

  • Removing extra columns I won’t need (my master spreadsheet holds more than 140 columns)

Table.SelectColumns (PreviousStep, {“ColumnName1”, “ColumnName2”})

I like using SelectColumn rather than RemoveColumns in case our software provider changes the format of the backup and add extra columns.

  • Picking the relevant type for some columns – this is especially important for dates

This is especially important for the date fields as the dates are not stored in a consistent way in the database. Some dates are stored with a time while others are not – this is mainly due to how the record was created (manually, semi automatically or imported from a previous system). It is a two steps process as some cells contain only a date, while others contain both a date and a time.

Table.TransformColumnTypes(PreviousStep,{{“RecordCreated”, type datetime}, {RecordUpdated”, type datetime}})

Table.TransformColumnTypes(PreviousStep,{{“RecordCreated”, type date}, {RecordUpdated”, type date}})

The previous steps are performed for all files. In addition to the above, there are some specific steps carried out against each file. Ideally my final file should hold the same number of lines as the master trademarks file (one line per trademark). This requires some twisting because on the other three files I am highly likely to have several lines per records. Classes are stored individually so one line per class and except if you are dealing with single-class country there is a lot of chance to have trademarks registered in more than one class. The same scenario might happen for prior registration but is less common in practice. Finally, the trickiest file to handle is the Actions file, this is a list of all actions for each record displayed as one line per action. I am still unsure on the best way to display actions in my final document, it will mainly depend on the goal of the exercise and what is important for the people consuming the report. This point will be covered in a future post.

Post1_2017_TMMaster
Trademarks Master – screenshot of a part of the spreadsheet

 

I now have 4 cleaned tables, I load them in Power Query using the option Create Connection Only – it avoids having duplicate data on your spreadsheet.

Post1_2017_OnlyCreateConnection
Import Data – Only Create Connection

Merging files

Once your files are cleaned and ready they can be merged using the merge option by right-clicking on your query’s name.

Post1_2017_Merge
Drop-down menu in the query pane.

The below window will open, I am selecting the table TM Classes and my matching column is TrademarkID – this column is common to all my files.

Post1_2017_Merge2
Merge window

A new column is created on the table with a ‘magical’ action button.

Post1_2017_ActionButton
TM_Classes has a special arrow, the action button.

Now I click on Merge Queries and merge my additional files.

Post1_2017_Merge_HomeRibbon
Power Query Home Ribbon – Merge Queries

By default everything is selected, I unticked TrademarkID as I already have this field on my table. I also always take the time to untick Use original column name as prefix. If the box is left ticked your column will be named TM_Classes.Classes and you will probably rename it to Classes – if you untick the box the column will be named clases without renaming it.

Post1_2017_ExpandColumn
Action button to expand the columns

All the files are merged, I click name my query TM_Final and click on Close & Load.

The next step in order to speed-up the process would be to transform this query into formula with parameters and a final function merging them all.This will be explored in a future post.