• Wednesday, 12 May 2021

    12 May 2021

  • The proposed talk will go through several questions. the first obvious one is why would I bother learn CLI when I can do whatever I need with a GUI tool ? We'll try to answer why knowing CLI is a MUST for some people (like Postgres DBAs, for example) whereas it's only a bonus for iothers (like data scientists, for example). Then we'll go through the basics about 101 (how to connect to, interactive moce versus not interactive more, how to set psql environement to work comfortably and so on...) The last part will be about tips and tricks that will make anyone's journey with psql more effective and enjoyable. I'm looking for the ""TIL"" effect in people's eyes.

  • I would like you to join me on our journey from a complex, multi instance Oracle topology to a single logical database in PostgreSQL. Each technology and architectural decision point will be discussed describing how we arrived at our destination. There are five keys areas that will be covered: - Target architecture - Migration of database objects (tables, indexes, views, synonyms, etc) - Migration of database code (packages, functions, procedures, triggers) - Application tier - Migration of Data - with minimal downtime during cutover The target architecture is a BDR cluster, where the physical data model and data stored is different between the logical standbys and the lead master/shadow master. Will discuss how this allowed for the simplification of the topology, and the benefits this delivered. Before you go there, yes I know PostgreSQL does no have synonyms, but an alternative approach was needed. There is a significant amount of business logic in the database tier all of which needed to be translated into database code. Will look at the tools and extensions available to reproduce the functionality in PostgreSQL. Look at common non-ISO standard SQL embedded in the application tier, along with jdbc challenges. Finally a look at some of the data movement tools available. Full disclosure, we are still on the journey but have learnt a lot on the way.

  • Internet of Things is a currently a burgeoning market, and is often associated with specialized data-stores. However PostgreSQL is just as capable at this use-case and can offer some compelling advantages. We’ll explore ways to store IoT data in PostgreSQL covering various ways to store and structure this kind of data. How range types and differing types of indexes can be of use. Also taking a quick look at some extensions designed for this use case. Then looking at powerful SQL features which can really help when analyzing IoT data streams, and how the power of a real SQL database can be a key advantage.

  • At the moment, Python is one of the most used programming languages in the world. It offers a nice set of characteristics to complement PostgreSQL. For example, if you need to write a relatively simple utility script, whose complexity extends beyond what you can easily do using Bash with psql, or you want to integrate your pre-existing database using an external system, or you want to write a fully production grade web service: chances are that Python might be a good tool to use. This is thanks to both its simple syntax and its wide range of libraries to choose from, to interact with external systems. In this talk, we will take a look at the current techniques we can use to make Python and PostgreSQL speak to each other. We will explore how we can run simple queries, avoid the dangers of SQL injections, make the most of the rich type systems that the two technologies offer, deal with asynchronous messages and so on! The talk will take into consideration psycopg2, the current industry standard module, which allows Python to interact with PostgreSQL. We will also take a look at the up and coming psycopg3 module, which is designed to make use of the functionalities available in the latest Python releases and communicate with the database in the most efficient way.

  • There have been plenty of “explaining EXPLAIN” type talks over the years, which provide a great introduction to it. They often also cover how to identify a few of the more common issues through it. EXPLAIN is a deep topic though, and to do a good introduction talk, you have to skip over a lot of the tricky bits. As such, this talk will not be a good introduction to EXPLAIN, but instead a deeper dive into some of the things most don’t cover. The idea is to start with some of the more complex and unintuitive calculations needed to work out the relationships between operations, rows, threads, loops, timings, buffers, CTEs and subplans. Most popular tools handle at least several of these well, but there are cases where they don’t that are worth being conscious of and alert to. For example, we’ll have a look at whether certain numbers are averaged per-loop or per-thread, or both. We’ll also cover a resulting rounding issue or two to be on the lookout for. Finally, some per-operation timing quirks are worth looking out for where CTEs and subqueries are concerned, for example CTEs that are referenced more than once. As time allows, we can also look at a few rarer issues that can be spotted via EXPLAIN, as well as a few more gotchas that we’ve picked up along the way. This includes things like spotting when the query is JIT, planning, or trigger time dominated, spotting the signs of table and index bloat, issues like lossy bitmap scans or index-only scans fetching from the heap, as well as some things to be aware of when using auto_explain.

  • Has your table become too large to handle? Have you thought about chopping it up into smaller pieces that are easier to query and maintain? What if it's in constant use? An introduction to the problems that can arise and how PostgreSQL's partitioning features can help, followed by a real-world scenario of partitioning an existing huge table on a live system. We will be looking at the problems caused by having very large tables in your database and how declarative table partitioning in Postgres can help. Also, how to perform dimensioning before but also after creating huge tables, partitioning key selection, the importance of upgrading to get the latest Postgres features and finally we will dive into a real-world scenario of having to partition an existing huge table in use on a production system.

  • In this talk I'll discuss how we can combine the power of PostgreSQL with TensorFlow to perform data analysis. By using the pl/python3 procedural language we can integrate machine learning libraries such as TensorFlow with PostgreSQL, opening the door for powerful data analytics combining SQL with AI. Typical use-cases might involve regression analysis to find relationships in an existing dataset and to predict results based on new inputs, or to analyse time series data and extrapolate future data taking into account general trends and seasonal variability whilst ignoring noise. Python is an ideal language for building custom systems to do this kind of work as it gives us access to a rich ecosystem of libraries such as Pandas and Numpy, in addition to TensorFlow itself.

  • "Why use PgBouncer? It’s a lightweight, easy to configure connection pooler and it does one job well. As you’d expect from a talk on connection pooling, we’ll give a brief summary of connection pooling and why it increases efficiency. We’ll look at when not to use connection pooling, and we’ll demonstrate how to configure PgBouncer and how it works. But. Did you know you can also do this? 1. Scaling PgBouncer PgBouncer is single threaded which means a single instance of PgBouncer isn’t going to do you much good on a multi-threaded and/or multi-CPU machine. We’ll show you how to add more PgBouncer instances so you can use more than one thread for easy scaling. 2. Read-write / read only routing Using different pgBouncer databases you can route read-write traffic to the primary database and route read-only traffic to a number of standby databases. 3. Load balancing When we use multiple PgBouncer instances, load balancing comes for free. Load balancing can be directed to different standbys, and weighted according to ratios of load. 4. Silent failover You can perform silent failover during promotion of a new primary (assuming you have a VIP/DNS etc that always points to the primary). 5. And even: DoS prevention and protection from “badly behaved” applications! By using distinct port numbers you can provide database connections which deal with sudden bursts of incoming traffic in very different ways, which can help prevent the database from becoming swamped during high activity periods. You should leave the presentation wondering if there is anything PgBouncer can’t do."