Building a Data Pipeline with Kafka, Spark Streaming and Cassandra

https://www.baeldung.com/kafka-spark-data-pipeline

 

스파크를 다루는 기술 책

https://thebook.io/006908/part02/ch06/01/02-01/

 

data engineering cookbook

https://github.com/andkret/Cookbook#introduction

 

data engineering roadmap

https://github.com/hasbrain/data-engineer-roadmap

 

How To Connect Spark to Your Own Datasource

https://www.slideshare.net/mongodb/how-to-connect-spark-to-your-own-datasource

 

Spark memory 관리 설명

https://medium.com/@leeyh0216/spark-internal-part-2-spark%EC%9D%98-%EB%A9%94%EB%AA%A8%EB%A6%AC-%EA%B4%80%EB%A6%AC-1-c18e39af942e

 

Spark 성능 측정 방법

https://db-blog.web.cern.ch/blog/luca-canali/2017-03-measuring-apache-spark-workload-metrics-performance-troubleshooting

 

리눅스 상황 60초 내에 파악하기

https://b.luavis.kr/server/linux-performance-analysis

 

MongoDB Sharding 에 대해 알아야 할 모든 것

https://www.mongodb.com/presentations/webinar-everything-you-need-know-about-sharding?jmp=docs

 

Hadoop-Mongo integration

http://www.ikanow.com/how-well-does-mongodb-integrate-with-hadoop/

 

Hadoop 성능 측정

https://blog.cloudera.com/what-is-hadoop-metrics2/

 

머신러닝 기초 설명

https://nittaku.tistory.com/category/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D%20%26%20%EB%94%A5%EB%9F%AC%EB%8B%9D/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D%20%EA%B8%B0%EC%B4%88

https://m.blog.naver.com/PostView.nhn?blogId=qbxlvnf11&logNo=221449297033&proxyReferer=https%3A%2F%2Fwww.google.com%2F

https://github.com/PacktPublishing/Mastering-Machine-Learning-with-scikit-learn-Second-Edition

 

Scikit Learn User Guide

https://scikit-learn.org/stable/user_guide.html

 

구글에서 제공하는 머신러닝 튜토리얼

https://developers.google.com/machine-learning/crash-course/training-and-test-sets/playground-exercise?hl=ko

 

선형대수

https://darkpgmr.tistory.com/103

 

Python Notebook 공유하는 곳

https://datascienceschool.net/view-notebook/17608f897087478bbeac096438c716f6/

 

Linear Regression의 쉬운 풀이

https://brunch.co.kr/@itschloe1/9

 

Docker 에 대해 알아보기, 동작 설명

http://blog.drakejin.me/Docker-araboza-1/

https://tech.ssut.me/what-even-is-a-container/

 

Hadoop IO time 을 측정하는 방법 검색 결과

https://www.google.com/search?newwindow=1&ei=3hDVXMjzBIvWmAXqh6yIDQ&q=hadoop+how+to+get+Elapsed+data+IO+time&oq=hadoop+how+to+get+Elapsed+data+IO+time&gs_l=psy-ab.3...43863.44039..44439...0.0..0.100.199.1j1......0....1..gws-wiz.qSUkb8BL-e8

https://stackoverflow.com/questions/42164449/calculate-time-taken-by-reducers-hadoop

https://github.com/linkedin/white-elephant

https://www.quora.com/MapReduce-Whats-the-best-way-to-measure-MR-job-runtime

 

YCSB 워크로드

https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload

 

Hadoop with MongoDB storage 질문 답변

https://stackoverflow.com/questions/52337696/hadoop-with-mongodb-storage

 

How does Apache Spark know about HDFS data nodes? 질문 답변

https://stackoverflow.com/questions/28481693/how-does-apache-spark-know-about-hdfs-data-nodes

 

 

+ Recent posts