Machine Learning – more learning….

As an addendum to last week’s post about machine learning – here is an article by the BBC : https://www.bbc.co.uk/news/technology-48825761. This describes a story about an amazon employee who built a cat-flap. Fed up with receiving “presents”, he made this clever device to recognise whether his cat has come home with prey in its jaws. If it has, it refuses to open and the cat must stay outside.

I thought it neatly encapsulated some problems of machine learning I am finding, and also pointed to some possible features where this technology could be applied to generate more value.

The training problem and need for clean data

It took 23,000 photos to train the algorithm. Each had to be hand sorted to determine whether there was a cat, a cat with prey etc. etc.

This is like the oil and gas industry in that it needs a lot of clean training data that may not be available without a lot of manual input.

The Bayesian stats can work against you

The frequency of event-occurrence in this case is once every 10 days. The maker ran a trial for 5 weeks (so should have seen 3-4 instances in that trial, though in this case it’s stated as seven). There was one false positive and one false negative – giving an error rate which may be around 20% (though there are not enough samples to have a lot of confidence). If the algorithm continues to learn and say that the cat uses the flap 5 times a day on average. This means, in a year it will have 1,825 additional samples from which 35 will be positives of which 7 will be false and 1790 negatives of which 358 will be false-positives. Manual correction will be required 365 times (i.e. per day on average) and the learning rate will take about 12 years to duplicate the original training set. I don’t know, but I suspect there are diminishing returns on adding new data so how much smarter it will be in 12 years I’ve no idea.

Disclaimer – a good statistician will know my maths above is not quite right, but the principle is.

Do you trust the alarm? Do you take notice?

So in this example, quite like oil and gas operations, the issue was getting hold of clean training data. The event being detected was comparatively rare meaning that a lot of false positives are likely. In the example above an “alarm” would have sounded 365 times and 28 occasions it would have been real. With the sparsity of events the this means that the algorithm will not learn very fast and I think the alarms will be ignored.

So where can the applications be better?

Distributed learning

Parallel learning helps build better predictive models. if we had the same cat-flap for every cat, and every owner corrected the false signals and right algorithm could learn quickly and disseminate the results to all, this would speed the learning process. Self-driving cars are a good example of where this is possible, and google-search is great example of the power of parallel experience.

Products and services impossible for a human

Situations where there is so much data that manual processing is impossible. Here I don’t mean that you can collect so much data on a manual operation that is ongoing that you cannot analyse all the extra information. What I do mean is that there is intrinsically so much information it would be impossible to analyse by hand and never has been. So an ML approach is the only one possible. For instance looking at real-time changes patterns in data networks.

Simple situations where it’s expensive and boring for a human

Automated first-line help systems, call screening, password resetting etc. These are all tasks where humans can do them, but they are simple tasks which are too boring for smart people to do, where automated help can often provide a better experience. And where “sorry I didn’t get that, did you say yes?” is only mildly irritating not the cause of major corporate loss.

Conclusion

There are places that machine learning will be revolutionary, but I suspect that much of ML will be either embedded to make normal tasks a bit easier – such as auto-spell checking, voice recognition etc. Or they will tackle classes of problem such as IT security, or on-line shopping behaviour where there is inherently a lot of fast-moving data and manual monitoring is simply not feasible nor fast enough to work.

Machine Learning — Data First

With all the attention on Machine Learning (ML) that I encountered at London Tech Week, I thought I better find out a bit more about it. I wanted to verify my view that it won’t have a dramatic impact on the Oil and Gas industry and see if this was actually true.

My findings, so far, is that ML might:

  • Speed up some analysis
  • Change the spread-sheet paradigm (for better or for worse)
  • Enable people with the right level of expertise to create predictive analysis (whether this will be at all valuable is a different matter)
  • In a myriad of small ways, change minor tasks (removing annoyance, reduce low-value add data & reporting work, change interfaces)

None of these applications are dramatic in themselves, but over time they may add incremental benefits that provide a moving improvement-front. A bit like a six-sigma or Kanban.

What will you need in order to drive broad-value from ML

You’ll only be able to take advantage of that if four things are true:

  1. You have clean, historical data available
  2. You can access and combine quality-controlled, time-dependant data in near real-time (ideally from multiple sources)
  3. You have wide-spread knowledge of how to apply the new analysis tools – like those based on “R”. (think how many people can use Excel today for collecting, analysing, querying and reporting on data – varying degrees of proficiency, but who do you know that can use “R”)
  4. You are prepared to reorganise the way work is performed to take advantage of the new possibilities created by: data analysis leading to demonstrated-fact-based / probability-assessed management decisions & employee actions.

Low cost hardware is the trigger

The interest in machine learning is spawned from the dramatic drop in the cost of hardware and software required to perform the number crunching required. Because of this, not only has the complexity of the addressable problem increased but also the inefficiency of code that can be supported increases the usability of tools and techniques leading to their application by practitioners outside pure decision sciences.

If you attend any of the IoT conferences – or speak to the large vendors of real-time industrial data, you’ll hear a lot about how edge-computing and Machine-Learning will change things for industry. “Edge” means placing computing power in the field with low-power and small costs.

Putting this alongside the sensor enables pre-processing information to send back only the results. This helps to reduce data bandwidths and increase responses.

For a view on how cheap this type of technology now is – and how undramatic the applications of ML really are I invite you to have a look at this video (from the hobbyist market) showing what can be done for less than $100. Listen out for the references to “TensorFlow” one day that will be important, there are also some passing references to cloud-based resources that may be of interest.

What I learned at London Tech Week

Big Data, Artificial intelligence, Machine Learning and Computer Vision

Turns out that Computer Vision requires Machine Learning, and Machine Learning requires Artificial intelligence. Artificial intelligence is of most use when there is large amounts of data to process. Hence in a way, AI relies on big-data (though not always). The cognoscenti use the abbreviations CV, ML, AI and err…. “big data” to refer to these technologies.

Some simple definitions

These are what are called HORIZONTAL TECHNOLOGIES, because they are general and can be applied to a range of problems across industries. They are already having an impact in some areas but they are not a panacea.

Big Data relates to the collection and storage of large quantities of data and being able to access and manipulate this quickly – using high speed networks, special search algorithms, parallel processing etc. It gives rise to a whole world of shared resources, cloud computing and specialised storage schemes. When real-time information is included (such as sensor data) then this is the world of IoT, time-series data and edge processing.

Ai is a way to analyse the information contained the big-data very quickly. Creating inferences between data signals and looking for patterns, sometimes used to predict outcomes and reduce uncertainty. AI is very narrow in its applicability, even Bill Gates says you wouldn’t trust it to order your inbox for you, so it’s ability to make judgements is limited. A lot of what we talk about being AI is a form of linear regression, mass computational power enabling the quick processing of data and crunching of large amounts of data. AI can give the illusion of being smart, when in fact it can be easily fooled.

Machine Learning is the ability for an algorithm/analysis to change over time by examining a changing stream of input information and comparing computed outcome with desired outcome and tuning. Neural networks. Machine learning is an application of AI.

Computer vision is an application of both AI and ML which is used to process images (still and moving), one of the most controversial applications of this is with facial recognition and the automatic tracking of people.

Some of the take-aways from London Tech-week

Big Data

Everyone that spoke (and I mean everyone) – said that their biggest issue of applying any form of advanced analysis fell down on the quality of the data. The meaning of information collected and the way it is labelled is so inconsistent. There were some semantics companies working with different ways of expressing ideas in language which may hold a key to explaining the differences between the labelling of data items. Until this takes off then 75%-90% of your AI budget is going to be spent cleaning up data and sorting out the meaning of feeds. Trouble is you will spend this 90% of your budget with no tangible change in outcome as you can’t get started until it’s sorted out.

AI

I saw what I thought was a brilliant example of this from a Swedish company called Spacemaker (https://spacemaker.ai/ ). This company works alongside architects to help them choose between the complex trade-offs required when selecting the layout of buildings. Trade-offs between natural light, housing density, noise exposure, energy efficiency etc. By providing optimisation inputs the computer very quickly generates possible layouts which works alongside the architect freeing them from the mundane, but complex, calculations and predictions of weather patterns, seasons etc. The result is much better buildings but not taking away the artistic judgements of the architects.

In Oil and Gas I can see a similar “advisor” system working alongside production engineers, economic planners, maintenance engineers, planners and schedulers helping to provide scenarios based on optimisation parameters enabling them to choose the best configuration to implement.

Machine Learning

I saw an example from a company called Dark Trace (https://www.darktrace.com/en/ ). As well as being a simply brilliant commercial success with enough financing to direct sufficient money dedicated to PR, marketing, sales and distribution (as well as well researched and implemented tech). Said quickly their system sits at the network hubs in your organisation and reads each packet of information (sometimes understanding the content, but often only it’s source and destination). It uses ML to work out what normal looks like for you (and evolves), and if it sees something abnormal start it can raise questions. It can also take action by IP spoofing to intercept traffic and block comms. One of their success stories is the NHS trusts that installed them and contained the WannaCry attack (https://www.bbc.co.uk/news/health-43795001 )

This might be applicable to Oil and Gas by monitoring all the signals in the real-time stack and learning what normal operation looks like and then being able to spot abnormalities as they start to occur. It would be more complex because the relationships are more complex than network traffic but still, got to be worth a shot.

Computer vision

I found this very interesting. I spoke to a company Winnow which has applied computer vision to the problem of cutting food waste from restaurants. (https://venturebeat.com/2019/03/21/winnow-uses-computer-vision-to-help-commercial-kitchens-cut-food-waste/ ). By using a camera it captures images of what the chef is throwing away and through a series of algorithms works out the amount and value of the waste. For instance, perhaps a chef orders the same amount of broccoli every day, but really only uses most of it on Fridays, Wednesdays and Saturdays. Or maybe the ordering changes with the weather – but either way, if you can reduce the over ordering then everyone wins.

People are already applying technology like this in industrial settings to check that people use their safety equipment (like glasses, harnesses etc.), you could also start to put cameras covering manual controls to create records of change and current settings – it would be a cheap way to retro-fit instrumentation.

Conclusion

That’s all for now but some of the other things I found out about included “The Ostrich Problem”, the real-world applications of AR/VR, more on cyber security and what 5G will really mean. Other things that are hot right now are Commercial adoption, future of work, small-scale adaptive robotics, AI-Ethics and Decarbonisation. More of this later.

London Tech Week

Last week was London tech-week. I guess a bit like London fashion week, only much larger.

There were events all over London – this has grown from a week of borrowed conference rooms and underground gatherings into a large series of happenings. Monday-Wednesday saw the COG-X show near the google campus north of Kings Cross. At this event there were 10 stages giving parallel presentation sessions over the full three days, an Expo, start-up section and various corporate networking events.

Wednesday and Thursday (yes overlapping) saw the TechXLR8 exhibition at the Excel Centre, this was a large expo event with presentation 6 stages running all day.

Alongside these two there was also 5G Europe and Identity Management conferences both of substantial size.

Have a look at the website here https://londontechweek.com/events – there are 18 pages of events with 7 events per page. Tech-XLR8 above is only one of these.

This blog previously covered the launch of the UK’s industrial strategy (at Jodrell Bank) and the lack of coverage of this in the main stream media [Link]. Well, despite there being still no interest from the media. The UK industrial strategy was evident everywhere – with announcements from the various bodies, challenges, and funding opportunities. Have a look at this if you haven’t already : https://www.gov.uk/government/publications/industrial-strategy-the-grand-challenges

And did you know there is an “Office for Artificial Intelligence” ? https://www.gov.uk/government/organisations/office-for-artificial-intelligence

I’ll write more about some of the events in due course but here are the highlights:

  • There were 1000’s of under 40, very intelligent, eager advocates of tech everywhere. Very diverse in terms of sex, ethnicity, country of origin you name it, very much in contrast to this [Link]
  •  AI, IoT, ML, CV, AR, VR were the flavours of the moment (and I learnt some really interesting new insights here, more later)
  •  AI Ethics is a huge deal, and lots of people are thinking about this.
  •  Energy tracks focused on decarbonisation, distributed grid and combining sensor technology with predictive algorithms to reduce consumption. Oil and Gas didn’t feature once.
  •  Interesting to see the traditional tech players (with notable exceptions) were looking dated and pushing out platitudes about the new tech and the business impact it should have (but with no concrete examples). Meanwhile there were (really) hundreds of well-funded small companies that had real-world use-cases for niche solutions that had demonstrated value (though most had not had to pass a business-case hurdle to get going).

What struck me most was the vibrancy of the arenas, the buzz of conversation and the high-energy engagement between participants – problem solving and exchanging ideas. It was very refreshing to see. There was also a willingness by all sorts of industries to try new solutions and approaches – knowing that not everything will work but understanding the need to learn and push the envelope forward. The pace of change is amazing.

I was lucky enough to have the chance to try a VR simulation made by Linconshire Fire Brigade to train their officers in fire investigation. On with a VR head-set and into a virtual world. It was very, very realistic.

Oh and everyone was talking about “Digital Disruption”

Next year London Tech-Week should be one for your diary.

Profile of an Engineer: Part 1 – the 20 year veteran

I thought I would share some musings on differences in the UK economic and social environment as different generations grow up, and the effect it might have on their approach to the future.

I hasten to add that all the characters here are completely made up. This is a deliberate caricature based on some of my own observations but I hope it’s thought provoking at least.

Ok, so meet Peter, he’s got 20 years experience and is currently head of Subsurface at Big Oil Co.

He was born in 1976 so he’s 42 now. He graduated in Geology in 1999. He was taught computing on a Vax mainframe.

The University computers still ran DOS, not Windows. He had to go to the library to research material for his dissertation. When he started work Geology and Geophysics were separate departments and colour pencils were still being used to colour in seismic renderings.

This is what the UK economy was doing throughout his whole time in education.

The FTSE 100 between 1976 and 2000

The prevailing attitude of the time was “we have worked out the formula”. Mankind rocks! You do this, you get that. Put money in the stock market, it goes up. “The end of boom and bust!” (link).

Companies were run by “Command and Control”. You start at the bottom and you work your way up. You do your time. You wear a suit to work.

Peter started work in 1999 and the world turned upside down! This is the FTSE 100 during his career.

The FTSE 100 between 2000 and 2018

So pretty much the day Peter left full time education the formula stopped working! Don’t get me wrong here, I don’t mean the laws of physics stopped working. I just mean the accepted wisdom he was infused with during his education didn’t hold true anymore.

Peter has now lived through his 3rd downturn in the Oil & Gas industry. The ups and downs of the cycle are driven by macro economic trends completely out of his control. The industry itself is slow to change.

Peter recognises the signs of cynicism appearing in his attitude. He fights them because he know’s cynicism kills enthusiam! He is looking for ways to genuinely make things better. He’s still got 20 years before he retires. He needs to prepare is company for the future and that means recruiting great talent. How can he attract them to an industry that’s still talking about how to do that same things it was 20 years ago, and thinks it’s funny when the person opening a “Hackathon” jokes that they can’t spell the word and don’t know how to turn on their iPad thing (link).

Spreadsheets are so 2008!

VisiCalc – the first spreadsheet https://en.wikipedia.org/wiki/VisiCalc

A few weeks ago, at a talk I was giving at a Finding Petroleum conference (link), I quipped that the Oil and Gas industry has been run on spreadsheets for over 30 years. Someone in the audience joked back during the questions afterwards that I wasn’t quite right, it was actually Powerpoint! They had a point, but here’s the reason I think spreadsheets have been the reason for the 20 years of progress between 1980 and 2000 and why they are not the right tool for the next revolution*.

Here is a chart of the FTSE 100 index between 1978 and 2017.

There was a period of about 20 years of near exponential growth between 1980 and 2000. I think there is a strong case to be made that this was thanks to the arrival of the spreadsheet. The first spreadsheet, VisiCalc, was released on the Apple platform in 1979. The first version of Excel came out in 1985 but it wasn’t until the release of Windows 3.1 in 1992 that things really took off.

So why are they responsible for this growth?

Because now we had a tool that allowed us to do much more complex analysis of things. Everyone could build their own model of the world in a spreadsheet and optimise it. Goal Seek let us “solve for X”. Now we could model the past and use it to predict the future – hooray, and off we merrily went. We got really good at planning and offline analysis and developed a centralised Command and Control approach:

  • We better modelled and understood what was happening
  • Data was sent back for offline analysis & understanding
  • Then we sent the instructions back

That was great, but I believe this way of working came to an end with the Dotcom crash in March 2000, 19 years ago today as I write this. The spreadsheets had got too big and our faith in the models we built was misplaced. It’s easy to make errors in formulas but is is very difficult to audit a spreadsheet someone else has built.  Complex spreadsheets make it look like we know what’s going on, and the person with the most convincing argument (best spreadsheet) at the time wins. But it doesn’t mean the answer on the spreadsheet is what will actually happen!  

Every model comes with implicit assumptions and what is not in the model is just as important as what is.

The world has changed. There is now a distrust of centralised decision making and a rebellion against command and control.

Spreadsheets are a great tool and will always be around, but I think we need to change our thinking in order to advance again. We need to move away from the old command and control style. 

We must recognise that we don’t actually know the future and we can’t define exactly what we want/need up front.

We must recognise that we don’t actually know the future and we can’t define exactly what we want or need up front. We have to take small steps, get knowledge, fail fast and learn quickly.

Oh, and why did I pick 2008? Well that was the financial crisis caused by a lot of people doing stuff based on models (most probably on spreadsheets) that they didn’t actually understand.

——

* This is based on a presentation I wrote with Gareth Davies in 2017 and presented at the Digital Energy forum in Aberdeen on 14-March (link).

Looking for inspiration

I like to look across sources for analogy and stimulating ideas. A couple of things have recently caught my eye.

I find it amazing how hard it is for people (including me) to see the implications of new technologies and ways of working. In retrospect, once a change has happened, it’s obvious what the outcome would have to be. But when the change is happening it’s not so clear.

Going up

Ground floor
Perfumery, stationary, and leather goods, wigs and haberdashery, kitchenware and food. Going up…

Can you remember the theme tune to Are You Being Served?

I’m old enough to remember the lift operators in Aberdeen’s E&M and Watt & Grant department stores. They were replaced by automated lifts in about 1980. The stores have both succumbed – one to the shopping mall, the other a victim to digital retail.

Being a lift operator was a skilled profession, making sure that you stopped the elevator car level with the floor and opening the concertina iron-work doors with the brass handles.  Apparently New York’s last lift operator was only made redundant in 2009 Link

The Economist 1843 magazine just ran a story making the connection between the elevator operators strike and the adoption of self-driving cars. We could probably do the same with roles in the oil field.

The elevator strikes in 1945-47 crippled the city, and led to calls to redesign the city so that only low-rise development was permitted – to reduce the power of unions.

Of course, the answer was – as we know – automated elevators. But a lot of change management was required before people started to use them. Innovations such as emergency stop buttons, telephones for help and recorded announcements all came about in this time.

I’ll wager that we will look back at some of the manual ways of operating an oilfield we use today in the same way was we look back at the anachronism of the elevator operator.

Electricity – who’d want that?

Another story that I picked up on and found illustrated a point was this one [Link]. It’s written by the BBC’s Tim Harford. He asked and answered the question why did it take so long for electricity to displace steam in the factories in the North of England. It was decades after the invention that it was fully adopted.

He explained that it required a redesign of factories before the economics made enough sense for people to abandon centrally powered manufacturing and move to individually powered machines. We’ll see the same adoption economics in oil field operations and technologies such as 3D printing.

Digital Marketing – a lesson for oil and gas?

Today I found another article that resonated. This one is from Marketing Week [Link]

Mark Ritson makes the case that the separation between Digital Marketing teams and Traditional Marketing is ridiculous. What I think he’s saying echoes my point that there should be no separation between “IT” and “The Business”, because IT needs to be just how things are done around here. It’s true in Marketing, it’s true in Oil and Gas too.

“… On the one hand you need to avoid being precious about your digital creds. Signal early you are entirely comfortable losing the D prefix from your title and, for good measure, add something re-assuring like ‘I do not even know what digital means anymore’ or ‘isn’t everything digital now?’.

The merger process means that anyone who is a member of the extreme digerati will be the victim of the new regime. You know the type: obsessed with AI, convinced in the long-term value of VR, boastful that they don’t own a TV. They will be the first to go when the revolution comes.

Digital experience is a prerequisite

But make no mistake, it’s no good proclaiming that digital is wank and it’s time to get back to basics, pull all the money from Facebook and get it back into ‘proper’ media. The post-digital era cuts both ways.

While idiot digerati will be exposed, so too will those who aren’t open to the potential of all the new research and media options that have appeared over the past decade. When Alastair Pegg, the leading marketer at Co-op Bank, noted that that there was “no such thing as digital marketing” he followed up with the corollary that “all marketing is digital marketing”.

I think I can see the parallels between what he’s saying is happening in Marketing now, and what will overtake the world of Oil and Gas operations in the next 3-5 years. What do you think?