How to Utilize Talend with Microsoft HDInsight

See what’s new in our latest version –

The video will show configuration of the HDInsight components within the Studio.
This feature is useful for users that wish to setup and configure Talend to run with HDInsight.



Big Data for the Masses – Talend Open Studio for Big Data: Part #2 – Removing duplicate records

On February 29, 2012, Talend announced the availability of Talend Open Studio for Big Data, to be released under the Apache Software License. You can download a preview from…

Talend Open Studio for Big Data is a powerful open source solution for big data integration that addresses the needs of the data analyst by providing them with a graphical tool that abstracts the underling complexities of big data technologies. It provides a palette of easy to configure components that automatically generates code for Hadoop Distributed File System (HDFS), Pig, Hbase, Sqoop and Hive.
All of this is available under an Apache License, which is fully open source and free to use.

This video demonstration shows how to read an Excel file of records and pipe these to Hadoop. For data quality purposes, you will also learn how to add a “unique row” component to remove any duplicates from the input.

Download Talend’s solutions:…


Introduction to Talend Data Preparation – for free

Download Talend Data Prep for FREE:

Introducing Talend’s new software: Talend Data Preparation! So many people often still do their analytics in spreadsheets, riveted with complicated formulas and repetitive tasks that they redo again and again ; or they depend on colleagues to get the data they need for their daily work. Data Preparation (or Data Prep) provides them the “self-service” capability to access, cleanse, prepare and combine data prior to analysis in a simple user-friendly way! Line of Business users as well as Business Analysts can now fix data in a very friendly and intuitive web interface, without needing any advanced IT skills! IT is the best placed to cope with the complex IT landscape of on-premise and cloud applications and to run data integration processes at scale. But this tool reaches the span of all comfort levels dealing with complex data.