https://www.analyticsvidhya.com/blog/2017/01/the-most-comprehensive-data-science-learning-plan-for-2017/#three-one
R for Data Science : http://r4ds.had.co.nz/index.html
Real-time analytics – Software such as Hadoop, Spark, and Kafka
http://nirvacana.com/thoughts/2013/07/08/becoming-a-data-scientist/
https://www.sqlshack.com/10-things-need-know-become-data-scientist/
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/
https://thomaslarock.com/2018/08/achievement-unlocked-certified-in-artificial-intelligence/
Data Science for DBA
https://www.youtube.com/watch?v=gubfZBdkdpI
http://www.sqlservercentral.com/Books/
https://www.iiitb.ac.in/main-control/uploaded-files/1522236481.pdf
https://www.digitalvidya.com/blog/data-analytics-interview-questions-answers/
https://www.zdnet.com/article/what-is-machine-learning-everything-you-need-to-know/
Map Reduce :
1. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
HBase : distributed, column oriented database;good for reading;NoSQL database ,database column oriented category
Hive : distributed Data-warehouse
HiveQL : SQL language query converted to a set of Map-Reduce tasks
Pig : language called to query for large datasets ; used for data ETL ,less lines of code
Spark : supports In-Memory processing.
HDFS: HDFS known as Hadoop Distributed File System
Map-Reduce : convert your serial program to a parallel version
Hadoop : mainly used where batch processing is needed. It is not great for real-time processing.
Provides both distributed storage
and distributed processing of very large data sets
Flume - Distributed system for collecting and aggregating log data, and writing it to HDFS.
Sqoop - Provides two way replication between Apache Hadoop and RDBMS. Supports snapshots and incremental updates.
Kafka - publish-subscribe messaging system. Hadoop is a consumer of Kafka.
Storm - computation based event-processing system. Often referred as real-time Hadoop. Storm cluster coordinates with Zookeeper.
Transactional data-stores
• Key-value - Oracle NoSQL, DynamoDB, Voldermort, Apache Accumulo
• Document-based - MongoDB, CouchDB
• Column-based - Apache Cassandra, Apache Hbase
• Graph-based - Neo4J, InfoGrid
• Relational - Apache Kudu
Sqoop :
https://www.tutorialspoint.com/sqoop/sqoop_installation.htm
https://www.youtube.com/watch?v=r1NLCComQ9Q&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=15
https://www.youtube.com/watch?v=RCtwhK_Jpyc
https://www.youtube.com/watch?v=HMphb8acV4Q
https://www.youtube.com/watch?v=-e6912666lc
https://www.youtube.com/watch?v=hY9nnU4PTFw
Flume :
Hive :
Hadoop :
https://www.youtube.com/watch?v=LgSiVWjTIUg&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=24
https://www.youtube.com/watch?v=Tt2GRh3eFMs&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=28
Data Science for DBA
1. http://codophile.com/2015/05/11/big-data-frameworks-every-programmer-should-know/
http://www.sqlservercentral.com/Books/
https://www.iiitb.ac.in/main-control/uploaded-files/1522236481.pdf
https://www.digitalvidya.com/blog/data-analytics-interview-questions-answers/
https://www.zdnet.com/article/what-is-machine-learning-everything-you-need-to-know/
Map Reduce :
1. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
HBase : distributed, column oriented database;good for reading;NoSQL database ,database column oriented category
Hive : distributed Data-warehouse
HiveQL : SQL language query converted to a set of Map-Reduce tasks
Pig : language called to query for large datasets ; used for data ETL ,less lines of code
Spark : supports In-Memory processing.
HDFS: HDFS known as Hadoop Distributed File System
Map-Reduce : convert your serial program to a parallel version
Hadoop : mainly used where batch processing is needed. It is not great for real-time processing.
Provides both distributed storage
and distributed processing of very large data sets
Flume - Distributed system for collecting and aggregating log data, and writing it to HDFS.
Sqoop - Provides two way replication between Apache Hadoop and RDBMS. Supports snapshots and incremental updates.
Kafka - publish-subscribe messaging system. Hadoop is a consumer of Kafka.
Storm - computation based event-processing system. Often referred as real-time Hadoop. Storm cluster coordinates with Zookeeper.
Transactional data-stores
• Key-value - Oracle NoSQL, DynamoDB, Voldermort, Apache Accumulo
• Document-based - MongoDB, CouchDB
• Column-based - Apache Cassandra, Apache Hbase
• Graph-based - Neo4J, InfoGrid
• Relational - Apache Kudu
Sqoop :
https://www.tutorialspoint.com/sqoop/sqoop_installation.htm
https://www.youtube.com/watch?v=r1NLCComQ9Q&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=15
https://www.youtube.com/watch?v=RCtwhK_Jpyc
https://www.youtube.com/watch?v=HMphb8acV4Q
https://www.youtube.com/watch?v=-e6912666lc
https://www.youtube.com/watch?v=hY9nnU4PTFw
Flume :
Hive :
Hadoop :
https://www.youtube.com/watch?v=LgSiVWjTIUg&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=24
https://www.youtube.com/watch?v=Tt2GRh3eFMs&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=28