edba: Data Science

https://www.analyticsvidhya.com/blog/2017/01/the-most-comprehensive-data-science-learning-plan-for-2017/#three-one

R for Data Science : http://r4ds.had.co.nz/index.html

Real-time analytics – Software such as Hadoop, Spark, and Kafka

http://nirvacana.com/thoughts/2013/07/08/becoming-a-data-scientist/

https://www.sqlshack.com/10-things-need-know-become-data-scientist/

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/

https://thomaslarock.com/2018/08/achievement-unlocked-certified-in-artificial-intelligence/

Data Science for DBA

1. http://codophile.com/2015/05/11/big-data-frameworks-every-programmer-should-know/

https://www.youtube.com/watch?v=gubfZBdkdpI

http://www.sqlservercentral.com/Books/

https://www.iiitb.ac.in/main-control/uploaded-files/1522236481.pdf

https://www.digitalvidya.com/blog/data-analytics-interview-questions-answers/

https://www.zdnet.com/article/what-is-machine-learning-everything-you-need-to-know/

Map Reduce :

1. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

HBase : distributed, column oriented database;good for reading;NoSQL database ,database column oriented category
Hive : distributed Data-warehouse
HiveQL : SQL language query converted to a set of Map-Reduce tasks
Pig : language called to query for large datasets ; used for data ETL ,less lines of code
Spark : supports In-Memory processing.
HDFS: HDFS known as Hadoop Distributed File System
Map-Reduce : convert your serial program to a parallel version
Hadoop : mainly used where batch processing is needed. It is not great for real-time processing.
Provides both distributed storage
and distributed processing of very large data sets
Flume - Distributed system for collecting and aggregating log data, and writing it to HDFS.
Sqoop - Provides two way replication between Apache Hadoop and RDBMS. Supports snapshots and incremental updates.
Kafka - publish-subscribe messaging system. Hadoop is a consumer of Kafka.
Storm - computation based event-processing system. Often referred as real-time Hadoop. Storm cluster coordinates with Zookeeper.

Transactional data-stores
• Key-value - Oracle NoSQL, DynamoDB, Voldermort, Apache Accumulo
• Document-based - MongoDB, CouchDB
• Column-based - Apache Cassandra, Apache Hbase
• Graph-based - Neo4J, InfoGrid
• Relational - Apache Kudu

Sqoop :

https://www.tutorialspoint.com/sqoop/sqoop_installation.htm

https://www.youtube.com/watch?v=r1NLCComQ9Q&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=15

https://www.youtube.com/watch?v=RCtwhK_Jpyc

https://www.youtube.com/watch?v=HMphb8acV4Q

https://www.youtube.com/watch?v=-e6912666lc

https://www.youtube.com/watch?v=hY9nnU4PTFw

Flume :

Hive :

Hadoop :

https://www.youtube.com/watch?v=LgSiVWjTIUg&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=24

https://www.youtube.com/watch?v=Tt2GRh3eFMs&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD&index=28