Apache Spark ã¯ãSQLãã¹ããªãŒãã³ã°ãæ©æ¢°åŠç¿ãã°ã©ãåŠççšã®çµã¿èŸŒã¿ã¢ãžã¥ãŒã«ãåããå€§èŠæš¡ãªããŒã¿åŠçã®ããã®çµ±ååæãšã³ãžã³ã§ããSpark ã¯ãApache HadoopãKubernetesãã¯ã©ãŠãäžã§åç¬ã§å®è¡ã§ããããŸããŸãªããŒã¿ãœãŒã¹ã«å¯ŸããŠå®è¡ã§ããŸããJavaãScalaãPythonãR ã§ãªãã㪠API ãæäŸããŠãããããå¹ åºãããããããŒãããŒã¿ ãµã€ãšã³ãã£ã¹ããå©çšã§ããŸãããã® Python API ã§ãã PySpark ã¯ãããŒã¿æäœã®ããã® Pandas ãªã©ã®äžè¬çãªã©ã€ãã©ãªãšãããŸãçµ±åãããŸããGoogle Cloud ã§ã¯ããµãŒããŒã¬ã¹ ãªãã·ã§ã³ãLightning EngineïŒãã¬ãã¥ãŒçïŒã®ãããªç»æçãªããã©ãŒãã³ã¹ã®åäžãçµ±åããŒã¿ããã³ AI ãã©ãããã©ãŒã ãžã®æ·±ãã€ã³ãã°ã¬ãŒã·ã§ã³ã«ãããApache Spark ãæ¬¡ã®ã¬ãã«ã«åŒãäžããŠããŸãã
ãããã質åãšããŠãApache Spark ãš Apache Hadoop ããããã®äœ¿ãã©ããã«é¢ãããã®ããããŸãããããã¯äž¡æ¹ãšãã仿¥ã®åžå Žã§ç¹ã«æåãªåæ£ã·ã¹ãã ã§ããããšãã« Apache ãããã¬ãã« ãããžã§ã¯ãã§ããäž¡ã·ã¹ãã ã«ã¯é¡äŒŒæ§ãããã䜵çšãããã±ãŒã¹ãå°ãªããããŸãããHadoop ã¯ã䞻㫠MapReduce ãã©ãã€ã ã䜿çšãããã£ã¹ã¯ã倧éã«äœ¿çšãããªãã¬ãŒã·ã§ã³ã«äœ¿çšãããŸããSpark ã¯ããæè»ãªãïŒå€ãã®å ŽåïŒããè²»çšã®ãããã€ã³ã¡ã¢ãªåŠçã¢ãŒããã¯ãã£ã§ããäž¡è ã®æ©èœãçè§£ããã°ãäž¡è ã®äœ¿ãã©ãããæ±ºå®ã§ããããã«ãªããŸãã
Google Cloud ã䜿çšããŠãããã·ã³ãã«ã§çµ±åããããè²»çšå¯Ÿå¹æã®é«ãæ¹æ³ã§ Apache Spark ã¯ãŒã¯ããŒããå®è¡ããæ¹æ³ãåŠã³ãŸããããApache Spark çš Google Cloud Serverless ãæŽ»çšããŠãŒããªãã¬ãŒã·ã§ã³ã®éçºãè¡ãããšããDataproc ã䜿çšããŠãããŒãžã Spark ã¯ã©ã¹ã¿ãäœæããããšãã§ããŸãã
Spark ã®ãšã³ã·ã¹ãã ã«ã¯ã次㮠5 ã€ã®äž»èŠã³ã³ããŒãã³ãããããŸãã
ãããã®ã³ã³ããŒãã³ãå šäœã§ãGoogle Cloud ã¯æé©åãããç°å¢ãæäŸããŸããããšãã°ãLightning Engine 㯠Spark ãš DataFrame ã®ããã©ãŒãã³ã¹ãåäžãããApache Spark çš Google Cloud Serverless ã¯ãããã€ãšç®¡çãç°¡çŽ åããŸãããŸããGemini 㯠BigQuery Studio ã Vertex AI Workbench ãªã©ã®ããŒãããã¯ç°å¢ã§ããããããŒã®çç£æ§ãé«ããŸãã
ã¹ããŒã
Spark ã®ã€ã³ã¡ã¢ãªåŠçãš DAG ã¹ã±ãžã¥ãŒã©ã«ãããHadoop MapReduce ãããã¯ãŒã¯ããŒããé«éåã§ããŸããç¹ã«å埩ã¿ã¹ã¯ã«å¹æçã§ããGoogle Cloud ã¯ãæé©åãããã€ã³ãã©ã¹ãã©ã¯ãã£ãš Lightning Engine ã§ãã®é床ãé«ããŠããŸãã
䜿ãããã
Spark ã®é«ã¬ãã«æŒç®åã«ããã䞊åã¢ããªã®æ§ç¯ãç°¡çŽ åãããŸããScalaãPythonãRãSQL ã§ã€ã³ã¿ã©ã¯ãã£ãã«äœ¿çšããããšã§ãè¿ éãªéçºãå¯èœã«ãªããŸããGoogle Cloud ã¯ãGemini ãæŽ»çšãããµãŒããŒã¬ã¹ ãªãã·ã§ã³ãšçµ±åããŒãããã¯ãæäŸãã䜿ãããããåäžãããŠããŸãã
ã¹ã±ãŒã©ããªãã£
Spark ã¯æ°Žå¹³æ¹åã®æ¡åŒµæ§ãæäŸããã¯ã©ã¹ã¿ããŒãéã§äœæ¥ã忣ããããšã§èšå€§ãªããŒã¿ãåŠçããŸããGoogle Cloud ã¯ããµãŒããŒã¬ã¹ã®èªåã¹ã±ãŒãªã³ã°ãšæè»ãª Dataproc ã¯ã©ã¹ã¿ã«ããã¹ã±ãŒãªã³ã°ãç°¡çŽ åããŸãã
äžè¬æ§
Spark ãåºç€ãšããŠãSQL ãš DataFrameãæ©æ¢°åŠç¿çšã® MLlibãGraphXãSpark Streaming ãšãã£ãã©ã€ãã©ãªã®ã¹ã¿ãã¯ãæ§ç¯ãããŠããŸãããããã®ã©ã€ãã©ãªã¯ãåãã¢ããªã±ãŒã·ã§ã³ã§ã·ãŒã ã¬ã¹ã«çµã¿åãããããšãå¯èœã§ãã
ãªãŒãã³ãœãŒã¹ ãã¬ãŒã ã¯ãŒã¯ã®ã€ãããŒã·ã§ã³
Spark ã¯ããªãŒãã³ãœãŒã¹ ã³ãã¥ããã£ã®åãæŽ»çšããŠè¿ éãªã€ãããŒã·ã§ã³ãšåé¡è§£æ±ºãå®çŸããéçºãšè£œååãŸã§ã®æéãççž®ããŸããGoogle Cloud ã¯ããã®ãªãŒãã³ãªç²Ÿç¥ãåãå ¥ããæšæºã® Apache Spark ãæäŸããªãããã®æ©èœã匷åããŠããŸãã
Apache Spark ã¯ãHadoop ã¯ã©ã¹ã¿ã¢ãŒãã§ãã¹ã¿ã³ãã¢ãã³ ã¢ãŒãã§ããããã€ã§ããé«éãªæ±çšã¯ã©ã¹ã¿èšç®ãšã³ãžã³ã§ããSpark ã䜿çšãããšãããã°ã©ããŒã¯ JavaãScalaãPythonãRãSQL ã§ã¢ããªã±ãŒã·ã§ã³ãè¿ éã«äœæã§ããããã«ãªããŸããããã«ãããããããããŒãããŒã¿ ãµã€ãšã³ãã£ã¹ããçµ±èšã®çµéšãæã€å é²çãªããžãã¹ãã³ããã¢ããªã±ãŒã·ã§ã³ã«ã¢ã¯ã»ã¹ã§ããããã«ãªããŸããSpark SQL ã䜿çšããã°ãä»»æã®ããŒã¿ãœãŒã¹ã«æ¥ç¶ããããŒãã«ãšã㊠SQL ã¯ã©ã€ã¢ã³ãã«æç€ºã§ããŸãããŸããã€ã³ã¿ã©ã¯ãã£ããªæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã Spark ã«ç°¡åã«å®è£ ã§ããŸãã
Apache ImpalaãApache HiveãApache Drill ãªã©ã® SQL å°çšãšã³ãžã³ã§ã¯ããŠãŒã¶ãŒã¯è€æ°ã®ããŒã¿ããŒã¹ã«ä¿åãããŠããããŒã¿ãã¯ãšãªããããã« SQL ãŸã㯠SQL ã«äŒŒãèšèªã®ã¿ã䜿çšã§ããŸããããã¯ããã¬ãŒã ã¯ãŒã¯ã Spark ãšæ¯èŒããŠå°ããããšãæå³ããŸãã ããããGoogle Cloud ã§ã¯ãå³å¯ãªéžæãããå¿ èŠã¯ãããŸãããBigQuery ã¯åŒ·å㪠SQL æ©èœãåããŠãããApache Spark çšã® Google Cloud Serverless ãš Spark ããã³ Hadoop ã®ãããŒãžã ãµãŒãã¹çšã® Dataproc ã«ãããSpark ã®å€æ§æ§ã掻çšã§ããŸããå€ãã®å ŽåãBigLake Metastore ãšãªãŒãã³åœ¢åŒãéããŠãåãããŒã¿ã§æŽ»çšã§ããŸãã
å€ãã®äŒæ¥ã Spark ã䜿çšããŠãæ§é åããŒã¿ãšéæ§é åããŒã¿ã®äž¡æ¹ãå«ã倧éã®ãªã¢ã«ã¿ã€ã ããŒã¿ãã¢ãŒã«ã€ã ããŒã¿ã®åŠçãåæãšãããå°é£ã§èšç®è² è·ã®é«ãã¿ã¹ã¯ãç°¡çŽ åããŠããŸããSpark ã䜿çšããã°ãæ©æ¢°åŠç¿ãã°ã©ã ã¢ã«ãŽãªãºã ãªã©ã®é¢é£ããè€éãªæ©èœãã·ãŒã ã¬ã¹ã«çµ±åã§ããŸãã äžè¬çãªã¢ããªã±ãŒã·ã§ã³ã¯æ¬¡ã®ãšããã§ãã
ããŒã¿ ãšã³ãžãã¢ã¯ãæ¡åŒµãããèšèªã»ããã§ããã°ã©ãã³ã°ãããªãã·ã§ã³ã䜿çšããSpark ã䜿çšããŠã³ãŒãã£ã³ã°ãšããŒã¿åŠçãžã§ãã®æ§ç¯ãè¡ããŸããGoogle Cloud ã§ã¯ãããŒã¿ ãšã³ãžãã¢ã¯ããŒããªãã¬ãŒã·ã§ã³ã® ETL/ELT ãã€ãã©ã€ã³ã®ããã« Google Cloud Serverless for Apache Spark ãæŽ»çšãããããããŒãžã ã¯ã©ã¹ã¿ã®å¶åŸ¡ã®ããã« Dataproc ã䜿çšãããã§ããŸãããããã¯ãã¹ãŠãã¬ããã³ã¹ã®ããã« BigQuery ã Dataplex Universal Catalog ãªã©ã®ãµãŒãã¹ãšçµ±åãããŠããŸãã
ããŒã¿ ãµã€ãšã³ãã£ã¹ãã¯ãGPU ã§ Spark ã䜿çšããŠåæã ML ã®ããè±ããªäœéšãå®çŸã§ããŸããäœ¿ãæ £ããèšèªã§å€§éã®ããŒã¿ãããé«éã«åŠçã§ãããããã€ãããŒã·ã§ã³ã®å éã«è²¢ç®ããŸãã Google Cloud ã¯ãSpark ã®å ç¢ãª GPU ãµããŒããš Vertex AI ãšã®ã·ãŒã ã¬ã¹ãªçµ±åãæäŸããããŒã¿ ãµã€ãšã³ãã£ã¹ããã¢ãã«ãããè¿ éã«æ§ç¯ããŠãããã€ã§ããããã«ããŸããBigQuery StudioãVertex AI Workbench ãªã©ã®ããŸããŸãªããŒãããã¯ç°å¢ã掻çšããããJupyter ã VS Code ãªã©ã®å¥œã¿ã® IDE ãæ¥ç¶ãããã§ããŸãããã®æè»ãªéçºãšã¯ã¹ããªãšã³ã¹ãš Gemini ãçµã¿åãããããšã§ãåœåã®ãããã¿ã€ãã³ã°ããæ¬çªç°å¢ãžã®ãããã€ãŸã§ãã¯ãŒã¯ãããŒãå éã§ããŸãã
$300 åã®ç¡æã¯ã¬ãžãããš 20 以äžã® Always Free ãããã¯ããæŽ»çšããŠãGoogle Cloud ã§æ§ç¯ãéå§ããŸãããã