Skip to content
Menu
Kenan Erarslan
  • Hasscript
    • BİNlerce cevap
    • Hello world!
    • Hasscript motivasyonumuz
    • 60 soruyu geçtik
    • 50 Üye / 90 soru / 100 cevap.
    • 200 soru
    • 365 gün geride kaldı.
    • 2019’un Hasscript’i
  • Experience
    • How to select Right Programming Language For Next Project
    • BigData Study
      • Hadoop Learning Journey / 1 / What is Hadoop
      • Hadoop Study Part 2 – Meaningless Fight
      • Hadoop Learning Journey / 3 / Reinventing The Wheel
  • Türkçe
    • Apache Ant
      • Apache Ant – 1) Kurulum
      • Apache Ant – 2) Dosyaları derlemek
      • Apache Ant – 3) Özellik işleri (property tasks)
      • Apache Ant – 4) Özellik dosyaları
      • Apache Ant – 5) Veri tipleri
      • Apache Ant – 6) Proje derlemek
      • Apache Ant – 7) Döküman derlemek
      • Apache Ant – 8) Jar oluşturmak
    • Duygu / Düşünce
      • Ben bu işin neresindeyim?
      • Enkazlar ülkesi
      • Hayal kurmak güzeldir.
      • Arch Linux ile tanışmak ve zorluklar
Kenan Erarslan

Hadoop Learning Journey / 3 / Reinventing The Wheel

Posted on February 2, 2019January 27, 2021

After long research activities and trials, we started to draw a picture about our big-data-architectural approach. All team members started to think in an aligned way. We needed to have a database for reporting and we needed to sync the database with a certain time frequency. Our understanding; hadoop is not a database. Hive is not a replacement for any relational database. Finally, Apache Spark is not the answer we were dreaming of (at least, it was not we expected). The question was simple, but the way up to the solution was not.

Wheel reinvented: Lambda Architecture

Lambda architecture is basically an architectural approach for big data processing problems. You can find its general view in the following illustration. Also you can find more information on http://lambda-architecture.net.

We decided not to store our data within Apache Hadoop ecosystem. The final decision was to copy and sync our data with another database. Mongo DB perfectly suited our expectations as it was easy to implement and develop. In lambda architectural terms, it is called the speed layer. The main intention is that the application will generate reports using a different database than the production environment to avoid unnecessary load. These reports do not need highly intensive processing. We will handle syncing the data between data source and mongo db using Apache Kafka with the help of Confluent.

When we need to implement complex and intensive reports, Apache Hadoop/Spark comes onto the stage. It is the layer which will handle the processing part. In lambda architectural terms, this is the batch layer. There are some different approaches here to fetch the data from the source. Here are some examples:

  • Export full data from the source and load it to hdfs (very common).
  • Connect to the data source from Apache Spark and process it.
  • Fetch data from the speed layer using Apache Spark since it has good support for mongo db.
  • Develop a message queue or streaming layer which pushes the data to batch layer at the same time of collecting the data.

Finally, generated data from batch layer to be pushed into mongo db again as serving layer. Users will be able to query the processed data from new collections.

The result

After long hours of discussion and research, we came to a conclusion and decided on this structure. Then we invited a colleague from another team, who said that there is an existing presentation on this. It describes an approach to the situation. He told us he could share the document and he eventually did. When we saw the structure, we all started laughing and said we are a good team.

Ali Barış, Akın, Haydar, Soner, Umut

Thank you to each and everyone of you for everything.

And Sancar for being the man within the shadows as editor.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

15 + seventeen =

March 2021
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  
« Feb    

RSS Hasscript.com | Programlama hakkında – Yeni soru ve cevaplar

  • npm proxy ayarlarını temizlemek nasıl yapılır?
  • Spring Security üzerinde bazı methodlara token olmadan nasıl erişebilirim?
  • Cevaplandı: Bir değerim var, sınıf tanımlamak istemiyorum. Nasıl tanımlayabilirim?
  • Spring security şifresiz login nasıl olabilirim?
  • Spring boot projesinde methodları rol bazlı nasıl ayırabilirim?
  • Cevaplandı: "No converter found capable of converting from type" hatası alıyorum, neden?
  • Cevaplandı: Spring data jpa için limit nasıl tanımlarım?
  • Cevaplandı: Java'da para değerlerini ne tarz değişkende tutmak mantıklıdır?
  • "There is no PasswordEncoder mapped for the id " hatası nasıl çözülür?
  • Cevaplandı: vuetify ile validasyonda 0 sorunu
©2021 Kenan Erarslan | WordPress Theme by Superbthemes.com